Introduction
The Giving What We Can research team is excited to share the results of our 2024 round of evaluations of charity evaluators and grantmakers!
In this round, we completed three evaluations that will inform our donation recommendations for the 2024 giving season. As with our 2023 round, there are substantial limitations to these evaluations, but we nevertheless think that they are a significant improvement to a landscape in which there were no independent evaluations of evaluators’ work.
In this post, we share the key takeaways from each of our 2024 evaluations and link to the full reports. We also include an update explaining our decision to remove The Humane League from our list of recommended programs. Our website has now been updated to reflect the new fund and charity recommendations that came out of these evaluations.
Please also see our website for more context on why and how we evaluate evaluators.
We look forward to your questions and comments!
Key takeaways from each of our 2024 evaluations
The three evaluators that we evaluated in our 2024 round of the evaluating evaluators project were:
- Founders Pledge Global Health and Development Fund
- Animal Charity Evaluators’ Movement Grants
- Animal Charity Evaluators’ Charity Evaluation Program
Global health and wellbeing
Founders Pledge Global Health and Development Fund (FP GHDF)
Based on our evaluation, we have decided not to currently include FP GHDF in our list of recommended charities and funds and do not plan to allocate a portion of GWWC’s Global Health and Wellbeing Fund budget to the FP GHDF at this point. However, we still think FP GHDF is worth considering for impact-focused donors and we will continue to host the program on the GWWC donation platform.
Some takeaways that inform this decision include:
- Our belief that FP GHDF evaluations do not robustly demonstrate that the opportunities they fund exceed their stated bar of 10x GiveDirectly in expectation, due to the presence of errors and insufficiently justified subjective inputs we found in some of their BOTECs that could cause the estimated cost-effectiveness to fall below this threshold.
- Our uncertainty about the future quality of evaluations following the recent departure of the senior researcher responsible for managing the fund and our belief that there are currently insufficient peer review processes, such as red-teaming, in place to reassure us that the fund quality will remain at least consistent following this transition.
Taken together, these issues left us unable to justifiably conclude that the FP GHDF is currently competitive with GiveWell’s recommendations.
While our decision is not to recommend FP GHDF at this time, we would like to emphasise that we did not conclude that the marginal cost-effectiveness of the GHDF is unambiguously not competitive with GiveWell’s recommended charities and funds — in fact, we think the GHDF might be competitive with GiveWell now or in the near future. Instead, our finding is that we can’t currently justifiably conclude that the fund is competitive with GiveWell in expectation, based on the evidence we have seen in our evaluation and the uncertainty surrounding staffing changes.
We also highlight positive findings from our evaluation. For example, we think that FP GHDF’s strategy of funding promising early-stage organisations plausibly has significant positive externalities, and the fact that FP GHDF has made several grants to organisations before GiveWell indicates they have some track record for identifying these promising funding opportunities.
For more information, please see our 2024 evaluation report for FP GHDF.
Animal welfare
Animal Charity Evaluators’ Movement Grants (ACE MG)
Based on our evaluation, we have decided to include ACE MG in our list of recommended charities and funds and we expect to allocate half of GWWC’s Effective Animal Advocacy Fund to ACE MG.
The key finding informing this decision is that ACE MG’s recent marginal grants and overall grant decision-making look to be of sufficiently high quality to be competitive with our current top recommendation in animal welfare: EA Funds’ Animal Welfare Fund (EA AWF) — by our evaluation of EA AWF in 2023. This is in large part due to our view that ACE MG has improved meaningfully since our previous evaluation, when we concluded that ACE MG performed slightly less strongly on several proxies for the marginal cost-effectiveness of its grants. In particular, we noted improvements in the quality of ACE MG’s:
- Marginal grantmaking
- Weighting of geographical neglectedness
- Approach to grant selection and sizing
We did find what we think to be room for improvement in ACE MG’s approach to more clearly integrating scope, particularly when comparing between applicants implementing different types of interventions. However, we don’t think this room for improvement affects the ACE MG’s position as being — to our knowledge — among the best places to recommend to donors.
We do not believe that our investigation identified ACE MG to be justifiably better or worse than EA AWF in terms of marginal cost-effectiveness — based on our 2023 evaluation of EA AWF — and so we continue to recommend EA AWF as a donation option alongside ACE MG, and to allocate half of our Effective Animal Advocacy Fund to EA AWF.
For more information, please see our 2024 evaluation report for ACE MG.
Animal Charity Evaluators’ Charity Evaluation Program
Based on our evaluation, we have decided not to currently include ACE’s Recommended Charities or the Recommended Charity Fund (RCF) in our list of recommended charities and funds nor to allocate a portion of GWWC’s Effective Animal Advocacy Fund to these programs at this point in time. However, we still think ACE’s Recommended Charities are worth considering for impact-focused donors and we may consider inviting some of these programs to apply to become supported programs on the GWWC donation platform in early 2025.
Some takeaways that informed our decision include:
- While ACE has made improvements to its approach for capturing scope and cost-effectiveness since our 2023 evaluation, we are not yet convinced that these play a sufficient role in ACE’s ultimate charity recommendation decisions for us to be confident that ACE reliably makes scope sensitive recommendations.
- We believe that ACE’s Charity Evaluation Program’s current approach does not sufficiently emphasise marginal cost-effectiveness as the most decision-relevant factor in evaluation decisions. For example, ACE primarily evaluates on the basis of organisations’ existing programs, rather than explicitly focusing their evaluation on those programs that are most likely to be funded on the margin. This is in direct contrast to ACE’s Movement Grants, which explicitly evaluates programs that ACE would be influencing funding to on the margin.
Additionally, while our decision is not to rely on ACE’s Charity Evaluation program, we would like to emphasise that we are not making the case that the marginal cost-effectiveness of (any particular) ACE Recommended Charity is clearly not competitive with ACE MG or the EA AWF. In fact, we think it is plausible that some of these charities are competitive with our recommended funds, and may be great options to consider for donors with particular worldviews and who have time to delve deeper into ACE’s materials. Instead, our conclusion is that we can’t justifiably conclude that the charity evaluation program consistently identifies charities competitive with the funding opportunities supported by our current recommendations (ACE MG and EA AWF), based on the evidence we have seen in our evaluation.
For more information, please see our 2024 evaluation report for ACE’s Charity Evaluation Program.
Additional recommendation updates
The Humane League’s corporate campaigns program
As part of our review of our recommended programs for the 2024 Giving Season, we have made the decision to remove The Humane League’s (THL) corporate campaigns program from our list of recommended programs. Our reason for this is that we believe we can no longer justifiably recommend THL’s program alongside our existing recommendations in animal welfare.
While our decision was to no longer recommend THL’s program, we want to be clear that our decision does not reflect any negative update on THL’s work. Instead, this is reflective of evidence being out of date, us maintaining the scope of the evaluating evaluators project, and keeping to our principles of usefulness, justifiability and transparency.
Specifically, this conclusion is predominantly based on the following:
- We generally base our charity and fund recommendations on our evaluating evaluators project, for reasons explained here. This year, the evaluators and grantmakers we’ll rely on – based on our evaluations – will be EA Funds’ Animal Welfare Fund and ACE’s Movement Grants, and neither of these currently recommends THL as a charity (as neither makes charity recommendations).
- In THL’s case last year, as explained in our report, we went beyond the scope of our usual project by using evidence provided by three further evaluators (Founders Pledge, Rethink Priorities and Open Philanthropy) to supplement the recommendation of an evaluator we had evaluated but decided not to rely on at that point (ACE) in order to ultimately recommend THL’s program. This was based on our overarching principles of usefulness, transparency and justifiability, via our judgement that providing a competitive alternative to the EA AWF – if we transparently and justifiably could – would be valuable.
- However, because we expect the approach of grantmakers/evaluators to change more slowly than the work/programs of an individual charity, we are not comfortable relying on the information we used last year without undertaking a reinvestigation,[1] which we don’t expect to do (see below)
- Given we now have two recommendations in animal welfare, we consider it somewhat less useful to look further into THL (or any other individual program in animal welfare) and we don’t think we can currently justify going beyond the scope of our evaluating evaluators project in the same way as we could last year.
THL’s corporate campaigns remain a supported program on our platform for donors who wish to continue supporting their work, and though we can no longer justifiably recommend them, we still think they are an option worth considering for impact-focused donors with particular worldviews and time to engage with the evidence supporting their corporate campaigns program.
Conclusion
We have created a webpage for those interested in learning more about our 2024 iteration of evaluating evaluators.
In the future, we plan to continue to evaluate evaluators – extending the list beyond the seven we’ve covered across our first two rounds, improving our methodology, and reinvestigating evaluators we’ve previously looked into to see how their approach/work has or has not changed.
- ^
Note that we do maintain our recommendation of EA AWF, as (like we stated above) we expect the quality of evaluations of a grantmaker – and by extension, the quality of the programs it funds when provided with an extra dollar – to change more slowly than the program funded by providing an extra dollar to any individual charity.
FP Research Director here.
I think Aidan and the GWWC team did a very thorough job on their evaluation, and in some respects I think the report serves a valuable function in pushing us towards various kinds of process improvements.
I also understand why GWWC came to the decision they did: to not recommend GHDF as competitive with GiveWell. But I'm also skeptical that any organization other than GiveWell could pass this bar in GHD, since it seems that in the context of the evaluation GiveWell constitutes not just a benchmark for point-estimate CEAs but also a benchmark for various kinds of evaluation practices and levels of certainty.
I think this comes through in three key differences in perspective:
My claim is that, although I'm fairly sure sure GWWC would not explicitly say "yes" to each of these questions, the implication of their approach suggests otherwise. FP, meanwhile, thinks the answer to each is clearly "no." I should say that GWWC has been quite open in saying that they think GHDF could pass the bar or might even pass it today — but I share other commenters' skepticism that this could be true by GWWC's lights in the context of the report! Obviously, though, we at FP think the GHDF is >10x.
The GHDF is risk-neutral. Consequently, we think that spending time reducing uncertainty about small grants is not worthwhile: it trades off against time that could be spent evaluating and making more plausibly high-EV grants. As Rosie notes in her comment, a principal function of the GHDF has been to provide urgent stopgap funding to organizations that quite often end up actually receiving funding from GW. Spending GW-tier effort getting more certain about $50k-$200k grants literally means that we don't spend that time evaluating new high-EV opportunities. If these organizations die or fail to grow quickly, we miss out on potentially huge upside of the kind that we see in other orgs of which FP has been an early supporter. Rosie lists several such organizations in her comment.
The time and effort that we don't spend matching GiveWell's time expenditure results in higher variance around our EV estimates, and one component of that variance is indeed human error. We should reduce that error rate — but the existence of mistakes isn't prima facie evidence of lack of rigor. In our view, the rigor lies in optimizing our processes to maximize EV over the long-term. This is why we have, for instance, guidelines for time expenditure based on the counterfactual value of researcher time. This programme entails some tolerance for error. I don't think this is special pleading: you can look at GHDF's list of grantees and find a good number that we identified as cost-effective before having that analysis corroborated by later analysis from GiveWell or other donors. This historical giving record, in combination with GWWC's analysis, is what I think prospective GHDF donors should use to decide whether or not to give to the Fund.
Finally - a common (and IMO reasonable) criticism of EA-aligned or EA-adjacent organizations is an undue focus on quantification: "looking under the lamppost." We want to avoid this without becoming detached from the base numeric truth, so one particular way we want to avoid it is by allowing difficult-to-quantify considerations to tilt us toward or away from a prospective grant. We do CEAs in nearly every case, but for the GHDF they serve an indicative purpose (as they often do at e.g. Open Phil) rather than a determinative one (as they often do at e.g. GiveWell). Non-quantitative considerations are elaborated and assessed in our internal recommendation template, which GWWC had access to but which I feel they somewhat underweighted in their analysis. These kinds of considerations find their way into our CEAs as well, particularly in the form of subjective inputs that GWWC, for their part, found unjustified.
[highly speculative]
It seems plausible to me that the existence of higher degrees of random error could inflate a more error-tolerant evaluator's CEAs for funded grants as a class. Someone could probably quantify that intuition a whole lot better, but here's one thought experiment:
Suppose ResourceHeavy and QuickMover [which are not intended to be GiveWell and FP!] are evaluating a pool of 100 grant opportunities and have room to fund 16 of them. Each has a policy of selecting the grants that score highest on cost-effectiveness. ResourceHeavy spends a ton of resources and determines the precise cost-effectiveness of each grant opportunity. To keep the hypo simple, let's suppose that all 100 have a true cost effectiveness of 10.00-10.09 Units, and ResourceHeavy nails it on each candidate. QuickMover's results, in contrast, include a normally-distributed error with a mean of 0 and a standard deviation of 3.
In this hypothetical, QuickMover is the more efficient operator because the underlying opportunities were ~indistinguishable anyway. However, QuickMover will erroneously claim that its selected projects have a cost-effectiveness of ~13+ Units because it unknowingly selected the 16 projects with the highest positive error terms (i.e., those with an error of +1 SD or above). Moreover, the random distribution of error determined which grants got funded and which did not -- which is OK here since all candidates were ~indistinguishable but will be problematic in real-world situations.
While the hypo is unrealistic in some ways, it seems that given a significant error term, which grants clear a 10-Unit bar may be strongly influenced by random error, and that might undermine confidence in QuickMover's selections. Moreover, significant error could result in inflated CEAs on funded grants as a class (as opposed to all evaluated grants as a class) because the error is a in some ways a one-way rachet -- grants with significant negative error terms generally don't get funded.
I'm sure someone with better quant skills than I could emulate a grant pool with variable cost-effectiveness in addition to a variable error term. And maybe these kinds of issues, even if they exist outside of thought experiments, could be too small in practice to matter much?
It's definitely true that all else equal, uncertainty inflates CEAs of funded grants, for the reasons you identify. (This is an example of the optimizer's curse.) However:
Thanks for the comment, Matt! We are very grateful for the transparent and constructive engagement we have received from you and Rosie throughout our evaluation process.
I did want to flag that you are correct in anticipating that we do not agree that with the three differences in perspectives that you note here nor do we think our approach implies we do agree:
1) We do not think a grant can only be identified as cost-effective in expectation if a lot of time is spent making an unbiased, precise estimate of cost-effectiveness. As mentioned in the report, we think a rougher approach to BOTECing intended to demonstrate opportunities meet a certain bar under conservative assumptions is consistent with a GWWC recommendation. When comparing the depth of GiveWell’s and FP’s BOTECs we explicitly address this:
[This difference] is also consistent with FP’s stated approach to creating conservative BOTECs with the minimum function of demonstrating opportunities to be robustly 10x GiveDirectly cost-effectiveness. As such, this did not negatively affect our view of the usefulness of FPs BOTECs for their evaluations.
Our concern is that, based on our three spot checks, it is not clear that FP GHDF BOTECs do demonstrate that marginal grants in expectation surpass 10x GiveDirectly under conservative assumptions.
2) We would not claim that CEAs should be the singular determinant of whether a grant is made. However, considering that CEAs seem decisive in GHDF grant decisions (in that grants are only made from the GHDF when they are shown by BOTEC to be >10x GiveDirectly in expectation), we think it is fair to assess these as important decision-making tools for the FP GHDF as we have done here.
3) We do not think maximising calculated EV in the case of each grant is the only way to maximise cost-effectiveness over the span of a grantmaking program. We agree some risk-neutral grantmaking strategies should be tolerant to some errors and ‘misses’, which is why we checked three grants, rather than only checking one. Even after finding issues in the first grant we were still open to relying on FP GHDF if these seemed likely to be only occurring to a limited extent, but in our view their frequency across the three grants we checked was too high to currently justify a recommendation.
I hope these clarifications make it clear that we do think evaluators other than GiveWell (including FP GHDF) could pass our bar, without requiring GiveWell levels of certainty about individual grants.
Hey Aidan,
I want to acknowledge my potential biases for any new comment thread readers (I used to be the senior researcher running the fund at FP, most or all of the errors highlighted in the report are mine, and I now work at GiveWell.) these are personal views.
I think getting further scrutiny and engagement into key grantmaking cruxes is really valuable. I also think this discussion this has prompted is cool. A few points from my perspective-
As Matt’s comment points out- there is a historical track record for many of these grants. Some have gone on the be GiveWell supported, or (imo) have otherwise demonstrated success in a way that suggests they were a ‘hit’. In fact, with the caveat that there are a good number of recent ones where it’s too early to tell, there hasn’t yet been one that I consider a ‘miss’. Is it correct to update primarily from 3 spot checks of early stage BOTECs (my read of this report) versus updating from what actually happened after the grant was made? Is this risking goodharting?
Is this really comparing like for like? In my view, small grants shouldn’t require as strong an evidence base as like, a multimillion grant, mainly due to the time expenditure reasons that Matt points out. I am concerned about whether this report is getting us further to a point where (due to the level of rigour and therefore time expenditure required) the incentives for grantmaking orgs are to only make really large grants. I think this systematically disadvantages smaller orgs, and I think this is a negative thing (I guess your view here partially depends on your view on point ‘3’ below.)
In my view, a really crucial crux here is really about the value of supporting early stage stuff, alongside other potentially riskier items, such as advocacy and giving multipliers. I am genuinely uncertain, and think that smart and reasonable people can disagree here. But I agree with Matt’s point- that there’s significant upside through potentially generating large future room for funding at high cost effectiveness. This kind of long term optionality benefit isn’t typically included in an early stage BOTEC (because doing a full VOI is time consuming) and I think it’s somewhat underweighted in this report.
I no longer have access to the BOTECs to check (since I’m no longer at FP) and again I think the focus on BOTECs is a bit misplaced. I do want to briefly acknowledge though that I’m not sure that all of these are actually errors (but I still think it’s true that there are likely some BOTEC errors, and I think this would be true for many/ most orgs making small grants).
We would like to extend our gratitude to Giving What We Can (GWWC) for conducting the "Evaluating the Evaluators" exercise for a second consecutive year. We value the constructive dialogue with GWWC and their insights into our work. While we are disappointed that GWWC has decided not to defer to our charity recommendations this year, we are thrilled that they have recognized our Movement Grants program as an effective giving opportunity alongside the EA Animal Welfare Fund.
Movement Grants
After reflecting on GWWC’s 2023 evaluation of our Movement Grants (MG) program we made several adjustments, all of which are noted in GWWC’s 2024 report. We’re delighted to see that the refinements we made to our program this year have led to grantmaking decisions that meet GWWC’s bar for marginal cost-effectiveness and that they will recommend our MG program on their platform and allocate half of their Effective Animal Advocacy Fund to Movement Grants.
As noted by GWWC, ACE’s MG program is unique in its aims to fund underserved segments of the global animal advocacy movement and address two key limitations to effectiveness within the movement:
For impact-focused donors seeking opportunities to build an evidence-based and resilient global animal advocacy movement, our Movement Grants program is an effective giving opportunity which supports brilliant animal advocates all over the world.
Alongside their recommendation of our MG program, GWWC has outlined several areas for improvement that we are grateful for and will reflect on.
We agree with these suggestions by GWWC:
Improving the documentation of our reasoning for making grant decisions—this is mainly related to our internal processes that don’t have any bearing on our grant decisions; however, we agree with GWWC that despite our diligent record keeping for how our thinking evolves through the grant review process, we need a better record that summarizes, in one place, the main rationale and cruxes for each grant decision. This is something we intend to implement in our next granting round.
We also want to note the following challenge:
GWWC recommends that we introduce a clearer framework for prioritizing between interventions. We agree with this recommendation—of the possible interventions available to help animals, some are already excluded from applying or rejected at an early stage from our grantmaking based on the scope of impact. However, while we intend to make further improvements in scope comparison between interventions, there remain challenges due to the many externalities that affect intervention effectiveness and our ability to estimate them. GWWC notes in the report that we appear resistant to doing this because it would be unhelpfully speculative. We want to clarify that we are willing to make speculations where we think they will be useful while highlighting the challenges. There is a difference between the more speculative forward-looking cost-effectiveness analyses (CEAs) we would be undertaking as a grantmaker compared with the CEAs the Charity Evaluations team undertakes on completed work. To overcome this, we may also consider comparing the known cost-effectiveness of the most similar organizations’ previous work or leveraging CEAs that have used the total available information on an intervention (e.g. this estimate). However, we remain cautious about spending time and resources on trying to find comparable cost estimates between interventions when doing so might not sufficiently increase the overall marginal cost-effectiveness of our grant decisions. We also want to note that this is a challenge for any animal advocacy funder, not just ACE. This is an area where we expect to continue to try different approaches and improve year-on-year, balancing available information and our team’s capacity.
We are grateful for the rare opportunity to reflect deeply on our work and to learn from GWWC’s perspectives so that we can award grants that are the most impactful for animals. We are especially thankful too for the larger GWWC and EA community that is willing to support highly promising projects to help some of the most neglected individuals who suffer greatly.
Charity Evaluations
On the other hand, we are disappointed that GWWC does not find our Charity Evaluations program justifiably competitive with MG (and the EA Animal Welfare Fund) and believe that donors might miss out on some of the most impactful donation opportunities because of GWWC’s decision. We will elaborate on the relationship between our Charity Evaluations and Movement Grants below, but first address some points specific to Charity Evaluations.
We agree with some of GWWC’s conclusions and suggestions for improvement, which appear in their 2024 report. We think that focusing on these will improve the quality of our recommendations moving forward:
However, there are also areas where we disagree with GWWC’s conclusions. While we acknowledge these parts of our methods have room for improvement, we think the changes that they suggest may not make a meaningful difference to the quality of our recommendations:
GWWC also suggests some broader strategic shifts in our programs that we plan to consider as a part of upcoming strategic planning. While these changes would help align our Charity Evaluations program more closely with GWWC’s criteria, we’re currently unsure if they would do the most good for the animal advocacy movement and for animals. These include:
Moving forward, we will continue evolving the Charity Evaluations program to find the organizations that can do the most good with additional donations and we thank GWWC for critically engaging with our work. We also appreciate that they acknowledge the difficulties of our work and the inherent differences between evaluators and funders in the animal advocacy space. However, given those difficulties, we’re currently not sure whether ACE nor any other evaluator that recommends whole charities in this space would be seen by GWWC as competitive with charitable funds that give restricted grants. Because of this, we’re not sure whether we think that GWWC’s current approach leads to the best outcomes for animals.
How ACE views Movement Grants vs. Charity Recommendations
We want to acknowledge that the language GWWC uses implies that they see our Movement Grants (MGs) and Charity Evaluations programs as “competitive” with each other. This is not a view we share—we see them as complementary.
Although there’s some overlap between charities that are a fit for each program, they serve different purposes:
The programs are complementary and supplement each other:
They also serve different donors:
We are proud to support all of our current Recommended Charities and Movement Grantees, and would like to take this opportunity to celebrate the impactful work they do to help make the world a kinder place for animals.
- ACE Team
Thank you for your comment! We’ve really appreciated the open and thoughtful way ACE engaged with us throughout these evaluations.
We are excited to be adding Movement Grants to our list of recommended programs this year, and we think the improvements we observed since our last evaluation are a testament to ACE’s constructive approach to engaging with feedback. We are also excited to continue highlighting opportunities like the Recommended Charity Fund and several of ACE’s Recommendations as supported programs on our platform.
I’ve been going through the evaluation reports and it seems like GWWC might not be as confident in Longview’s Emerging Challenges Fund or the EA Long-Term Future Fund as they are in their choices for GHD and Animal Welfare. The reports for these funds often include some uncertainties, like:
On the other hand, the Founders Pledge GHD fund wasn’t fully recommended due to more specific methodological issues:
Until I read various posts around the forum and personally looked into what LTFF in particular was funding, I was under the impression—partly from GWWC’s messaging—that the LTFF was at least comparable to a GiveWell or even an ACE. This is partly because GWWC usually recommend their GCR funds at the same time as these other funds.
It might be on me for having the wrong assumptions, so I wrote out my chain of thinking, and I’m keen to find out where we disagree:
(I originally posted this to the 2024 recommendations but thought it might be more constructive / less likely to cause any issues over in this thread)
(I no longer work at GWWC, but wrote the reports on the LTFF/ECF, and was involved in the first round of evaluations more generally.)
In general, I think GWWC's goal here is to "to support donors in having the highest expected impact given their worldview" which can come apart from supporting donors to give to the most well-researched/vetted funding opportunities. For instance, if you have a longtermist worldview, or perhaps take AI x-risk very seriously, then I'd guess you'd still want to give to the LTFF/ECF even if you thought the quality of their evaluations was lower than GiveWell's.
Some of this is discussed in "Why and how GWWC evaluates evaluators" in the limitations section:
And also in each of the individual reports, e.g. from the ACE MG report:
Mmm, so maybe the crux is at (3) or (4)? I think that GWWC may be assuming too much about how viewers are interpreting the messaging and presentation around the evaluations. I think there is probably a way to signal the differences in evaluation strength while still maintaining the BYO worldview approach?
Just speaking for myself, I'd guess those would be the cruxes, though I don't personally see easy-fixes. I also worry that you could also err on being too cautious, by potential adding warning labels that give people an overly negative impression compared to the underlying reality. I'm curious if there are examples where you think GWWC could strike a better balance.
I think this might be symptomatic of a broader challenge for effective giving for GCR, which is that most of the canonical arguments for focusing on cost-effectiveness involve GHW-specific examples, that don't clearly generalize to the GCR space. But I don't think that indicates you shouldn't give to GCR, or care about cost-effectiveness in the GCR space — from a very plausible worldview (or at least, the worldview I have!) the GCR-focused funding opportunities are the most impactful funding opportunities available. It's just that the kind of reasoning underlying those recommendations/evaluations are quite different.
I am not sure I understand the claim being made here. Do you believe this to be the case, because of a tension between hits based and cost-effective giving?
If so, I may disagree with the point. Fundamentally if you're an "hit" grant-maker, you still care about (1) The amount of impact as a result of a hit (2) the odds on getting a hit (3) Indicators which may lead up to getting a hit (4) The marginal impact of your grant.
1&2) Require solid theory of change, and BOTEC EV calculations
3) Good M&E
Fundamentally, I wouldn't see much of a tension between hits based and cost-effective giving, other than a much higher tolerance for risk.
I suppose to tack onto Elliot’s answer, I’m curious about what you see the differences in reasoning to be. If it is merely that GCR giving opportunities are more hits-based / high variance, I could see, for example, a small label being applied on the GWWC website next to higher-risk opportunities with a link to something like the explanations you’ve written above (and the evaluation reports).
That kind of labelling feels like only a quantitative difference from the current binary evaluations (as in, currently GWWC signals inclusion/exclusion, but could extend that to signal for strength of evaluation or risk of opportunity).
Good job on highlighting this. While I very much understand GWWC's angle of approach here, I can see that there's essentially a dynamic that could be playing out whereby some areas (Animal Welfare and Global Development) get increasingly rigorous, while other areas (Longtermist problem-areas and Meta-EA) don't receive the same benefit.
Thanks for the comment! While we think it could be correct that the quality of evaluations differs between our recommendations in different cause areas, my view is that the evaluating evaluators project applies pressure to increase the strength of evaluations across all cause areas. In our evaluations we communicate areas where we think evaluators can improve. Because we are evaluating multiple options in each cause area, if in future evaluations we find one of our evaluators has improved and another has not, then the latter evaluator is less likely to be recommended in future, which provides an incentive for both evaluators to improve their processes over time.
Thanks for your comment, Huw! I think Michael has done a great job explaining GWWC’s position on this, but please let us know if we can offer any clarifications.
Thank you for these thorough reports and the project as a whole! As Chair of the EA Animal Welfare Fund, I'm very grateful for GWWC's continued work evaluating evaluators and grantmakers in the animal welfare space and personally grateful for their work across all cause areas. I think this sort of meta-evaluation is incredibly valuable. This year, I'm particularly excited to see ACE's Movement Grants join the recommended list this year - their improvements are great developments for the field. Last year's evaluation of AWF was also very helpful for our team, and we're looking forward to the re-evaluation next year. It's encouraging to see the evaluation and funding ecosystem becoming increasingly robust. Thank you for your work!
Thanks so much for your comment, Karolina! We are looking forward to re-evaluating AWF next year.
From your 2024 evaluator research page:
I think you did great based on the time you spent. Going through your evaluations, I thought to myself "good point" (or similar) so many times! Have you considered fundraising specifically for your evaluator research? I would be interested in supporting you as long as some (e.g. 1/3) of the additional funds go towards evaluating animal welfare evaluators.
Thanks for the positive feedback! We are actively considering the future of the GWWC research team, including whether we invest additional resources in future evaluating evaluators projects. To my knowledge, we have not considered fundraising specifically for the evaluating evaluators project. This might be an option we consider at some point, but I think we are probably unlikely to do so in the near future.
Thanks, Aidan and Sjir! I really like this project.
I do not understand why you "think the GHDF might be competitive with GiveWell now". From your report, it seems quite clear to me that donating to GiveWell's funds is better now. You "looked into [3] grants that were relatively large (by FP GHDF standards), which we expect to be higher effort/quality", and found major errors in 2 grants, and a major methodological oversight in the other.
Worth pointing out that FP staff who could reply on this is on Thanksgiving break, so will probably take until next week.
Hey Vasco, these are my personal thoughts and not FP’s (I have now left FP, and anything FP says should take precedence). I have pretty limited capacity to respond, but a few quick notes—
First, I think it’s totally true that there are some BOTEC errors, many/ most of them mine (thank you GWWC for spotting them— it’s so crucial to a well-functioning ecosystem, and more selfishly, to improving my skills as a grantmaker. I really value this!)
At the same time—these are hugely rough BOTECs, that were never meant to be rigorous CEA’s: they were being used as decision-making tools to enable quick decision-making under limited capacity (i do not take the exact numbers seriously: i expect they're wrong in both directions), with many factors beyond the BOTEC going into grantmaking decisions.
I don’t want to make judgments about whether or not the fund (while I was there) was surpassing GiveWell or not— super happy to leave this to others. I was focused on funders who would not counterfactually give to GW, meaning that this was less decision-relevant for me.
I think it's helpful to look at the grant history from FP GHDF. Here’s all the grants that I think have been made by FP GHDF since Jan 2023, apologies if i’ve missed any:
* New Incentives, Sightsavers, Pure Earth, Evidence Action
* r.i.c.e, FEM, Suvita (roughly- currently scaling orgs, that were considerably smaller/ more early stage when we originally granted)
* 4 are ecosystem multiplier-y grants (Giving Multiplier, TLYCS, Effective Altruism Australia, Effektiv Spenden)
* 1 was an advocacy grant to 1DaySooner (malaria vax roll out stuff), 1 was an advocacy grant to LEEP
* 5 are recent young orgs, that we think are promising but ofc supporting young orgs is hit and miss (Ansh, Taimaka, Essential, IMPALA, HealthLearn)
* 1 was deworming research grant
* 1 was an anti-corruption journalism grant which we think is promising due to economic growth impacts (OCCRP)
I think it's plausible that i spent too little time on these grant evals, and this probably contributed to BOTEC errors. But I feel pretty good about the actual decision-making, although I am very biased:
* At least 2 or 3 of these struck me as being really time-sensitive (all grants are time-sensitive in a way, but I'm talking about ‘might have to shut down some or all operations’ or ‘there is a time-sensitive scaling opportunity’).
* I think there is a bit of a gap for early-ish funding, and benefits to opening up more room for funding by scaling these orgs (i.e. funding orgs beyond seed funding, but before orgs can absorb or have the track record for multi million grants). Its still early days, but feel pretty good about the trajectories of the young orgs that FP GHDF supported.
* Having a few high EV, ‘big if true’ grants feels reasonable to me (advocacy, economic growth, R and D).
I hope this context is useful, and note that I can’t speak to FP’s current/ future plans with the FP GHDF. I value the engagement- thanks.
Thanks for the comment! While we did find issues that we think imply Founders Pledge’s BOTECs don’t convincingly show that the FP GHDF’s grants surpass 10x GiveDirectly in expectation in terms of marginal cost-effectiveness, we don’t think we can justifiably conclude from this that we are confident these grants don’t pass this bar. As mentioned in the report, this is partly because:
Rosie’s comment also covers some other considerations that bear on this and provides useful context that is relevant here.
Hi there, writing on behalf of The Humane League!
THL is grateful to GWWC for their support of our corporate campaigns to reduce farmed animal suffering, previously as a recommended program and moving forward as a supported program. GWWC provides an important service to the philanthropic community, and the funds they have directed to THL have driven highly impactful victories to reduce animal suffering, including a recent groundbreaking victory that will spare 700,000 hens from immense suffering annually once implemented and kickstart the cage-free supply in Asia, a neglected yet highly important region for corporate campaigns.
While we are disappointed that our corporate campaigns to reduce farmed animal suffering will no longer be listed as a recommended program, we understand GWWC’s decision to limit the scope of their recommendations in line with their evaluating evaluators project. We hope that donors will continue to support our corporate campaigns and the Open Wing Alliance as highly impactful giving opportunities to reduce the suffering of millions of sentient beings and build an effective global animal advocacy movement, given the strong marginal impact of our supported regranting programs in neglected regions. We are grateful that GWWC will continue accepting gifts on THL’s behalf; in addition gifts can be made directly to THL, or through our other international charity partners for donors outside the US seeking tax deductions.
If you have any questions about THL’s programs or high-impact funding opportunities, please reach out to Caroline Mills at cmills@thehumaneleague.org.
With gratitude,
The THL team
Thanks for your comment, Caroline! We are excited to continue hosting THL as a supported program on the GWWC platform, so donors can continue supporting your important work.
Thank you Sjir and Aidan for this excellent work! I think it's quite valuable for community epistemics to have someone doing this kind of high-quality meta-evaluation. And it seems like your dialogue with ACE has been very productive.
Selfishly as somone who makes a number of donations in the animal welfare space, I'm also excited by the possibility of some more top animal charities becoming supported programs on the donation platform :)
Thanks so much for your comment, we appreciate the positive feedback! We plan to open applications for our supported programs status in Q1 2025 and intend to invite ACE-recommended charities to apply. We will make decisions about which charities we onboard based on our inclusion criteria, which we’ll update early next year but expect to not change dramatically.
I very much agree with your decision of not recommending ACE's RCF, and I think your evaluation was great.
Did you have the chance to look into the pain intensities Ambitious Impact uses to calculate SADs? They are not public, but you can ask Vicky Cox for the sheet. I think they hugely underweight excruciating pain, such that interventions addressing this (like Shrimp Welfare Project’s Humane Slaughter Initiative) have their cost-effectiveness underestimated a lot. Feel free to ask Vicky for my comments on the pain intensities.
Nitpick. ACE's additional funds discount for low uncertainty should be higher than 0 % (and lower than 20 %), or there is a typo in the table below?
Thanks for the comment! We didn’t look deeply into the SADs framework as part of our evaluation, as we didn’t think this would have been likely to change our final decision. It is possible we will look into this more in future evaluations. I currently expect use of this framework to be substantially preferable to a status quo where there is not a set of conceptually meaningful units for comparing animal welfare interventions.
On ACE’s additional funds discounts, our understanding is that the 0% discount for low uncertainty is not a typo. ACE lists their uncertainty categories and the corresponding discounts under Criterion 2 on their evaluation criteria page.
Love to see these reports!
I have two suggestions/requests for 'crosstabs' on this info (which is naturally organised by evaluator, because that's what the project is!):
Thanks for the comment — we appreciate the suggestions!
With respect to your first suggestion, I want to clarify that our goal with this project is to identify evaluators that recommend among the most cost-effective opportunities in each cause area according to a sufficiently plausible worldview. This means among our recommendations we don’t have a view about which is more cost-effective, and we don’t try to rank the evaluators that we don’t choose to rely on. That said, I can think of two resources that might somewhat address your suggestion:
With respect to your second suggestion, while we don’t include a checklist as such, we try to include the major areas for improvement in the conclusion section of each report. In future we might consider organising these more clearly and making them more prominent.
From your evaluation of ACE MG:
Which diet change interventions are more cost-effective than cage-free corporate campaigns? Meaningfully reducing meat consumption is an unsolved problem, and I suspect many diet change interventions are harmful due to leading to the replacement of beef and pork with poultry meat, eggs (which can be part of a vegetarian diet), fish and other seafood.
In any case, I do not think this is that important for your recommendation. You found only 11.1 % of the money granted by ACE MG went to diet change interventions, and it does not look like marginal grants were super related to diet change.
Thanks for the comment! I first want to highlight that in our report we are specifically talking about institutional diet change interventions that reduce animal product consumption by replacing institutional (e.g., school) meals containing animal products with meals that don’t. This approach, which constitutes the majority of diet change programs that ACE MG funds, doesn’t necessarily involve convincing individuals to make conscious changes to their consumption habits.
Our understanding of a common view among the experts we consulted is that diet change interventions are generally not competitive with promising welfare asks in terms of cost-effectiveness, but that some of the most promising institutional diet change interventions plausibly could be. For example, I think some of our experts would have considered the grant ACE MG made to the Plant-Based Universities campaign worth funding. Reasons for this include:
As noted in the report, not all experts agreed that the institutional diet change interventions were on average competitive with the welfare interventions ACE MG funded. However, as you noted, this probably has a fairly limited impact on how cost-effective ACE MG is on the margin, not least because these grants made up a small fraction of ACE MG’s 2024 funding.
Thanks, Aidan. For reference, I estimated corporate campaigns for chicken welfare are 25.1 times as cost-effective as School Plates, which is a program aiming to increase the consumption of plant-based foods at schools and universities in the United Kingdom.
I think the rate of change is overwhelmingly a function of size, not of whether it respects grantmakers/evaluators or work/programs. I would expect THL's work/program to change more slowly than the Animal Welfare Fund. In any case, I agree with your decision of not recommending THL. As you say, it is more consistent with the scope of your project:
Hi Vasco, thanks for the comment! I should clarify that we are saying that we expect marginal cost-effectiveness of impact-focused evaluators to change more slowly than marginal cost-effectiveness for charities. All else equal, we think size is plausibly a useful heuristic. However, because we are looking at the margin, both the program itself and its funding situation can change, and as THL hasn’t been evaluated for how it allocates funding on the margin or starts new programs, but just on the quality of its marginal programs at the time of evaluation, there is a less robust signal there than there is for EA AWF, which we did evaluate on the basis of how it allocates funding on the margin. I hope that makes sense!