The Case for Evaluators to "Finish the Job" (Draft Amnesty Week)

Hey everyone, this is an old draft from 2022 for a post I’ve been hand-wringing about posting any version of – but it was the most popular draft amnesty idea from last year that I didn’t do, and I’m finally at the point where I think something basic like this post is good enough to get my contribution to this discussion out there:

Well before I regularly visited the EA Forum, I remember one of the first debates I ran into on it were the criticisms, from 2021, about ACE’s then recent social justice activities. The anonymous user “Hypatia” published a set of criticisms of some of these activities including accusations of promoting unrigorous, vague anti-racist statements, and canceling a speaker who was critical of BLM. Some people involved with ACE contested how these events were characterized, and it seems that as of today much of the debate has cooled down and no one has changed their minds very much.

One of the criticisms Hypatia brought up was that ACE was evaluating organizations, in part, based on how good it thought their values were outside of just their effectiveness at helping animals. Hypatia pointed out that it seemed valuable to have an organization that evaluated pure animal welfare effectiveness, without assuming other values on the part of donors, and then donors could decide for themselves what else mattered to them. At the time, this seemed mostly reasonable to me, and I couldn’t think of a lot wrong with this reasoning. One sort of pushback I’ve seen from ACE, for instance Leah Edgerton brought it up in her interview with Spencer Greenberg, is to deny the premise. Looking at aspects of the internal culture of these organizations is useful to predict harder to measure aspects of how healthy an organization is, and how much good it can ultimately do for animals. Another piece of pushback which I have not seen, but suspect would be commonsense to many people, is to bite the bullet. Yes, as an organization we are willing to look at things other than pure animal welfare. If an animal rights organization was going around lighting the houses of factory farmers on fire, a good evaluation organization would be perfectly reasonable in the decision not to rank them at all even if they did effectively help animals.

Both tactics seem respectable in theory, but in practice carry some burden of proof ACE may not meet. Is failure to meet certain norms ACE prioritizes really so predictive of eventual failure? Certainly Hypatia is skeptical in some cases, and I find it hard to believe that these social justice norms are precisely the ones you would look for in an organization if your sole goal was predicting stability. And what about the setting fires example, this is proof of concept that sometimes conflicts of values are strong enough to justify disqualification, but these organizations don’t seem to be doing anything a fraction that serious. After listening to the Edgerton interview, I thought about this debate for the first time in a while, and found that I actually had something that, as far as I know, hasn’t been mentioned yet, that seems to me like it is ACE’s strongest defense, and generally may be a consideration EA cause evaluators should consider more.

One of my favorite recent (as of my initial drafting) posts on EA culture from the forum was “Unsurprising things about the EA movement that surprised me” by Ada-Maaria Hyvarinen. In it, she raises the very relatable point that

“In particular, there is no secret EA database of estimates of the effectiveness of every possible action (sadly). When you tell people effective altruism is about finding effective, research-based ways of doing good, it is a natural reaction to ask: ‘so, what are some good ways of reducing pollution in the Baltic Sea/getting more girls into competitive programming/helping people affected by [current crisis that is on the news]’ or ‘so, what does EA think of the effectiveness of [my favorite charity]’. Here, the honest answer is often ‘nobody in EA knows’, and it is easy to sound dismissive by adding ‘and we are not going to find out anytime soon, since it is obvious that the thing you wanted to know about is not going to be the most effective thing anyway’.”

Maybe it is obvious to some people, but Hypatia’s reaction to ACE looking at values other than animal effectiveness makes the most sense in worlds where this point is not true, in which all EA organizations are sort of like Charity Navigator, and seek, as a primary part of their mission, to evaluate as comprehensive a list of charities as possible. Saying that ACE should stick to animal effectiveness is sort of like asking for an organizational model in which every cause evaluator is engaging in a different effective altruism. Insofar as none of the other measures of how good an organization are fit the specific version of EA they are dedicated to, it is simply none of their business. ACE doesn’t look at the impact of its organizations on the global poor, and Givewell doesn’t look at the impact of its organizations on animals, and none of them ask about how good or bad their organizations are for squishier, harder to measure values like promoting a more tolerant, welcoming movement culture. In the real world where EA evaluators aren’t like Charity Navigator, no one is checking the impact of Give Directly on chickens because it isn’t a candidate for the best charity in the world for the chicken-interested effective altruism.

I cannot think of any reason to like this world, no justification other than, “it is not my problem”. The alternative solution, the one that seems to be the default expectation of many EA organizations right now, is a “buyer beware” mentality, in which people looking to donate to one of the recommended causes can personally decide against it based on their own research into how well these organizations fit their own values.

It seems to me as though a world in which individual donors, possibly just by checking how the organization talks about itself or obvious things about its approach (whether it sets houses on fire) are good fits for their values, is strictly worse than a world in which EA evaluators are expected to, in some sense, finish the job. To consider lots of possible values that might go into deciding where to donate, and carefully investigating all of them in its most promising choices. Alright, this point has a complicated relationship to what ACE’s initiatives actually look like. They are, after all, moving money through grants, and not just evaluations. I think this is a more complicated issue, though similar considerations to the ones I bring up here will, I think, vindicate some approach that looks at values other than just animal welfare (regardless of whether this looks like the stuff ACE is currently looking at, or if it should look different in some way).

ACE could also be more upfront about separating out the results of these evaluations so that donors can more easily weight them for themselves. They could also look into a wider range of values, or different ones if you dislike ACE’s value picks for independent reasons. There is a ton of room for improvement, but this is roughly where my ranking for EA charity evaluator approaches currently lines up, from best to worst:

There are charity evaluators for every single important value. Global health, animal welfare, existential risk, social justice, and everything else donors have a reason to care about. They all do their research as broadly as Charity Navigator, and as deeply as effective altruist charity evaluators. A different, independent evaluator, aggregates these results, and can give you rankings based on different weights you specify for each value.
There are separate charity evaluators for different types of values. They all look at the top picks of one another, and give thorough feedback about how they interact with the values the reviewing organization is most interested in.
An independent charity evaluator is dedicated to doing research into the top causes of different evaluators, and providing reports of how these organizations interact with values other than the ones each evaluator prioritizes.
Each charity evaluator organization evaluates only the top organizations in its own field, but investigates how each of these organizations interact with values other than the ones the evaluator is meant to most prioritize, and reports on this.
Each charity evaluator looks into the top organizations in its own field, looks at how these organizations relate to other values they think are important, and publish recommendations that incorporate both things (being transparent about how they weigh different values)
Each charity evaluator only looks at one value, like animal welfare. It finds top charities in this field, ranks them, and just shows this to the public.

ACE seemed to be in an uncomfortable middle ground between the 4th and 5th best option. Arguably it is worse than that, it only looks at a handful of values other than animal welfare, and if Edgerton’s testimony is representative, they do not even concern themselves with these other values beyond instrumental value to animal welfare. I find this latter claim hard to believe, and not necessary to justify ACE’s practices (or some idealized version of those practices), but if I do believe this claim about how ACE considers other values, I think they have reason to go even further.

Even if their practices are subpar, I think there is reason to expect them to be strictly better than 6, what I see as the default for charity evaluators. Meanwhile, 1 is pretty much impossible. 2-3 all seem probably possible, but 2 is just not done and a broad culture shift within EA would be necessary for it, and 3 is easier, but does require a new organization. It happens in a less focused way through scattered forum posts and some global priorities research, but not in the most thorough way it could be. All seem pretty good to me in the grand scheme of things, especially compared to 6, and thinking about them gives me the impression that there is a great deal of room for impactful entrepreneurship in this field.

I’m not sure why this point doesn’t seem very discussed. One possibility is that it just isn’t actually a good idea. For instance, maybe figuring out how well charities indirectly fare on different values from the ones they are targeting is just harder to evaluate than how they fare on target values, or maybe when this impact is easier to measure, it is so obvious that reports from a charity evaluator aren’t necessary. Taken together, this could mean that the resources that would go into a project like this wouldn’t be worth it.

That said, the fact that ACE did come to different conclusions about these charities when it looked to values that aren’t directly related to animal welfare makes me think this is not so obvious. Another possibility is that although I haven’t really heard discussion about it, it really is something that people talk about, or that these organizations take part in, and I just don’t notice. I can certainly think of organizations that do things like this, but mostly they are grant-giving organizations like Open Philanthropy, and I don’t think quite fit what I’m talking about here. Regardless, I thought these thoughts were worth bringing up in case they really did touch on a factor that is neglected, not just in the ACE debate, but charity evaluation more broadly.

Nicholas / Heather Kross23d1

Looking at aspects of the internal culture of these organizations is useful to predict harder to measure aspects of how healthy an organization is, and how much good it can ultimately do

This also could've helped with other orgs over the years, where the "culture" stuff turned out to have important signal. E.g. FTX, Leverage Research.

Devin Kalish23d2

YES

Animal Charity Evaluators16d4

Thanks for this interesting perspective on how to balance different values within the work of evaluations, Devin. Considering you drafted this in 2022, we do want to note that a lot has changed at ACE in the last three years, not least of which has been a shift to new leadership. Since early 2022, ACE has transitioned to a new Executive Director, Programs Director, Charity Evaluations Manager, Movement Grants Manager, Operations Director, and Communications Director.

That said, ACE continues to assess organizational health as part of our charity evaluations—we assess whether any aspects of an organization’s governance or work environment pose a risk to its effectiveness or stability, thereby reducing its potential to help animals. Furthermore, bad actors and toxic practices could negatively affect the reputation of the broader animal advocacy movement, which is highly relevant for a growing social movement, as well as advocates’ wellbeing and their willingness to remain in the movement. You can read more about our reasoning here and about our current evaluation criteria here.

Thanks for your thought-provoking piece. We are continually refining our evaluation methods so we will consider your points further about the kinds of instrumental information we might want to gather and how we could do so in a pragmatic way.

Thanks, Elisabeth

Devin Kalish16d2

Thanks for the response, I appreciate it!

SummaryBot1mo1

Executive summary: Charity evaluators like ACE should aim to assess not just their primary focus (e.g., animal welfare) but also other relevant values, to provide a fuller picture for donors and improve decision-making in effective altruism.

Key points:

The debate over ACE’s evaluation methods highlights a tension between prioritizing pure animal welfare and considering broader values like social justice or organizational culture.
Some argue that ACE should focus solely on effectiveness in animal welfare, while others defend its broader approach as useful for predicting an organization’s overall impact.
Many charity evaluators operate in silos, ignoring cross-cutting impacts; a better system would involve coordination between evaluators to assess organizations from multiple value perspectives.
An ideal system would either feature comprehensive evaluators assessing all major values or independent aggregators synthesizing findings from specialized evaluators.
While ACE’s current approach is imperfect, it is still preferable to narrowly focused evaluations that ignore externalities and broader ethical considerations.
The EA community may benefit from new initiatives or organizations dedicated to filling these evaluation gaps, though practical challenges in assessing indirect impacts remain a key obstacle.

This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.

Effective Altruism Forum
EA Forum

The Case for Evaluators to "Finish the Job" (Draft Amnesty Week)

31

31

Reactions

More posts like this