Hide table of contents

Why I crossposted this: I found it to be an interesting perspective on a not-uncommon assumption within EA (that every charity should be conducting self-evaluation), written by someone with a lot of field experience in charity consulting/evaluation (who is, I think, mostly aligned with EA's views on the importance of effectiveness).

I don't necessarily endorse any of this, but reasons 2-4 all seem like good reasons to at least not pursue certain types of self-evaluation. Reason 1 seems like a different kind of problem, but still rings true and points to the need for funders to be strict about what sorts of research they care about, or otherwise aim at changing the incentives in place.


Most charities should not evaluate their own impact. Funders should stop asking them to evaluate themselves. For one thing, asking somebody to mark their own homework was never likely to be a good idea.

This article explains the four very good reasons that most charities should not evaluate themselves, and gives new data about how many of them are too small.

Most operational charities should not (be asked to) evaluate themselves because:

1. They have the wrong incentive.

Their incentive is (obviously!) to make themselves look as great as possible – evaluations are used to compete for funding – so their incentive is to rig the research to make it flattering and/or bury research that doesn’t flatter them. I say this having been a charity CEO myself and done both.

2. They lack the necessary skills in evaluation. 

Most operational charities are specialists in, say, supporting victims of domestic violence or delivering first aid training or distributing cash in refugee camps. These are completely different skills to doing causal research, and one would not expect expertise in these unrelated skills to be co-located.

3. They often lack the funding to do evaluation research properly. 

One major problem is that a good experimental evaluation may involve gathering data about a control group which does not get the programme or which gets a different programme, and few operational charities have access to such a set of people.

A good guide is a mantra from evidence-based medicine, that research should “ask an important question and answer it reliably”. If there not enough money (or sample size) to answer the question reliably, don’t try to answer it at all.

4. They’re too small. 

Specifically, their programmes are too small: they do not have enough sample size for evaluations of just their programmes to produce statistically meaningful results, i.e., to distinguish the effects of the programme from that of other factors or random chance, i.e., results of self-evaluations by operational charities are quite likely to be just wrong. For example, when the Institute of Fiscal Studies did a rigorous study of the effects of breakfast clubs, it needed 106 schools in the sample: that is way more than most operational charities providing breakfast clubs have.

Giving Evidence has done some proper analysis to corroborate this view that many operational charities’ programmes are too small to reliably evaluate. The UK Ministry of Justice runs a ‘Data Lab’, which any organisation running a programme to reduce re-offending can ask to evaluate that programme: the Justice Data Lab uses the MoJ’s data to compare the re-offending behaviour of participants in the programme with that of a similar (‘propensity score-matched’) set of non-participants. It’s glorious because, for one thing, it shows loads of charities’ programmes all evaluated in the same way, on the same metric (12-month reoffending rate) by the same independent researchers. It is the sole such dataset of which we are aware, anywhere in the world.

In the most recent data (all its analyses up to October 2020), the JDL had analysed 104 programmes run by charities (‘the voluntary and community sector’), of which fully 62 prove too small to produce conclusive results. 60% of the charity-run programmes were too small to evaluate reliably.

The analyses also show the case for reliable intervention and not just guessing which charity-run programmes work or assuming that they all do:

a. Some charity-run programmes create harm: they increase reoffending, and

b. Charity-run programmes vary massively in how effective they are:

Hence most charities should not be PRODUCERS of research. But they should be USERS of rigorous, independent research – about where the problems are, why, what works to solve them, and who is doing what about them.

Comments3


Sorted by Click to highlight new comments since:

Some other, partly overlapping reasons:

  • In rushing to measure their impact to meet requests for impact evaluation, they might just focus on the wrong things. E.g. proxy metrics that sound like good impact evaluation but aren't very good indicators really. If measuring in their "own" timelines, rather than when asked, charities might have more scope and time to do it carefully.
  • I think there's something to be said for just trying to do something really well and only subsequently stopping to take stock of what you have or haven't achieved. (We've taken pretty much the opposite approach at Animal Advocacy Careers and I periodically wonder whether that was a mistake)
  • if you're doing something that seems pretty clearly likely to be cost-effective, given the available evidence, spending resources on further evaluation might just be a waste.
  • Similarly, unless conducting and disseminating research is an important part of your theory of change, the research focus might be be a distraction if it doesn't seem likely to affect your decision-making.

Thanks for this- a really interesting read! 

I was wondering where you would suggest charities should get this 'independent research' from? One of the EA virtual events I attended briefly mentioned 'expert' research. Would you agree? If so I am curious what  you mean by 'experts'?

Again, thanks for the post!

There are a lot of ways that scientific research can be useful to charities. For example, a vaccination charity might design its program based on the design of programs that were shown to be successful at increasing vaccination rates in randomized controlled trials. 

This is different from testing one's own program, which might be impractical for the reasons outlined in this post, but it's a "second-best" option that should at least make you more likely to run an impactful program.

I think EA tends to use a pretty standard definition of "experts" -- people who know a lot about a subject, and have some degree of skill in conducting research that leads them to learn more true information about the world.

Curated and popular this week
 ·  · 20m read
 · 
Advanced AI could unlock an era of enlightened and competent government action. But without smart, active investment, we’ll squander that opportunity and barrel blindly into danger. Executive summary See also a summary on Twitter / X. The US federal government is falling behind the private sector on AI adoption. As AI improves, a growing gap would leave the government unable to effectively respond to AI-driven existential challenges and threaten the legitimacy of its democratic institutions. A dual imperative → Government adoption of AI can’t wait. Making steady progress is critical to: * Boost the government’s capacity to effectively respond to AI-driven existential challenges * Help democratic oversight keep up with the technological power of other groups * Defuse the risk of rushed AI adoption in a crisis → But hasty AI adoption could backfire. Without care, integration of AI could: * Be exploited, subverting independent government action * Lead to unsafe deployment of AI systems * Accelerate arms races or compress safety research timelines Summary of the recommendations 1. Work with the US federal government to help it effectively adopt AI Simplistic “pro-security” or “pro-speed” attitudes miss the point. Both are important — and many interventions would help with both. We should: * Invest in win-win measures that both facilitate adoption and reduce the risks involved, e.g.: * Build technical expertise within government (invest in AI and technical talent, ensure NIST is well resourced) * Streamline procurement processes for AI products and related tech (like cloud services) * Modernize the government’s digital infrastructure and data management practices * Prioritize high-leverage interventions that have strong adoption-boosting benefits with minor security costs or vice versa, e.g.: * On the security side: investing in cyber security, pre-deployment testing of AI in high-stakes areas, and advancing research on mitigating the ris
 ·  · 32m read
 · 
Summary Immediate skin-to-skin contact (SSC) between mothers and newborns and early initiation of breastfeeding (EIBF) may play a significant and underappreciated role in reducing neonatal mortality. These practices are distinct in important ways from more broadly recognized (and clearly impactful) interventions like kangaroo care and exclusive breastfeeding, and they are recommended for both preterm and full-term infants. A large evidence base indicates that immediate SSC and EIBF substantially reduce neonatal mortality. Many randomized trials show that immediate SSC promotes EIBF, reduces episodes of low blood sugar, improves temperature regulation, and promotes cardiac and respiratory stability. All of these effects are linked to lower mortality, and the biological pathways between immediate SSC, EIBF, and reduced mortality are compelling. A meta-analysis of large observational studies found a 25% lower risk of mortality in infants who began breastfeeding within one hour of birth compared to initiation after one hour. These practices are attractive targets for intervention, and promoting them is effective. Immediate SSC and EIBF require no commodities, are under the direct influence of birth attendants, are time-bound to the first hour after birth, are consistent with international guidelines, and are appropriate for universal promotion. Their adoption is often low, but ceilings are demonstrably high: many low-and middle-income countries (LMICs) have rates of EIBF less than 30%, yet several have rates over 70%. Multiple studies find that health worker training and quality improvement activities dramatically increase rates of immediate SSC and EIBF. There do not appear to be any major actors focused specifically on promotion of universal immediate SSC and EIBF. By contrast, general breastfeeding promotion and essential newborn care training programs are relatively common. More research on cost-effectiveness is needed, but it appears promising. Limited existing
 ·  · 11m read
 · 
Our Mission: To build a multidisciplinary field around using technology—especially AI—to improve the lives of nonhumans now and in the future.  Overview Background This hybrid conference had nearly 550 participants and took place March 1-2, 2025 at UC Berkeley. It was organized by AI for Animals for $74k by volunteer core organizers Constance Li, Sankalpa Ghose, and Santeri Tani.  This conference has evolved since 2023: * The 1st conference mainly consisted of philosophers and was a single track lecture/panel. * The 2nd conference put all lectures on one day and followed it with 2 days of interactive unconference sessions happening in parallel and a week of in-person co-working. * This 3rd conference had a week of related satellite events, free shared accommodations for 50+ attendees, 2 days of parallel lectures/panels/unconferences, 80 unique sessions, of which 32 are available on Youtube, Swapcard to enable 1:1 connections, and a Slack community to continue conversations year round. We have been quickly expanding this conference in order to prepare those that are working toward the reduction of nonhuman suffering to adapt to the drastic and rapid changes that AI will bring.  Luckily, it seems like it has been working!  This year, many animal advocacy organizations attended (mostly smaller and younger ones) as well as newly formed groups focused on digital minds and funders who spanned both of these spaces. We also had more diversity of speakers and attendees which included economists, AI researchers, investors, tech companies, journalists, animal welfare researchers, and more. This was done through strategic targeted outreach and a bigger team of volunteers.  Outcomes On our feedback survey, which had 85 total responses (mainly from in-person attendees), people reported an average of 7 new connections (defined as someone they would feel comfortable reaching out to for a favor like reviewing a blog post) and of those new connections, an average of 3