Hide table of contents

Epistemic status: gestural food for thought.

This is a post to aggregate a bunch of boring or obvious observations that are easy to overlook. I don't expect anything to be new. But I do think difficulties gauging impact, in aggregate, form a sort of missing mood that we should be paying more attention to. At the end of the post, I'll touch on why.

Let us count the ways

Here are some factors that can make assessing impact complex:

  • Some interventions are backed by scientific studies.
    • Scientific studies vary in quality in many ways. They can be larger or smaller, more or less numerous, clearly biased or apparently unbiased, clearly significant or only marginally significant, randomized or non-randomized, observational or experimental, etc.
    • Even for good studies, the same interventions may become better or worse over time as conditions in the world change, or may be very particular to certain places. Cash transfers, for example, might just work way better in certain parts of the world than others, and it may be hard to predict how or why in advance.
  • Many interventions require many simultaneous layers of involvement.
    • Suppose I give to the Against Malaria Foundation. I can say "I estimate I am saving a life per $5,000 I spend." But I read Peter Singer when I was 13, then Scott Alexander when I was 19, and I likely wouldn't have ended up donating much without these. I also couldn't give to AMF if it didn't exist, so I owe a debt to Rob Mather. And perhaps whoever told Scott Alexander about AMF. All these steps are necessary to actually "save a life", so we run the risk of massively overcounting if we give every person in the chain "full credit".
    • But there's no objectively rigorous way to decide who gets how much of the credit! Just using counterfactuals doesn't work; it may be the case that all of us are required and a single person "out of the chain" breaks it down. But we can't all get all the credit!
    • Plus many interventions, like in the AMF example, mostly are just reducing probabilities across large numbers of people anyway. What does it even mean for "my money" to "save life". Once the money all goes into a pool, whose money actually funds which nets anyway? And which nets prevent cases of malaria that would have been fatal? No way to answer these questions even in principle.
  • Some (perhaps all) interventions rely on difficult-to-impossible philosophical questions to resolve.
    • How should we weigh insect suffering? All we can do is guess - learning more facts about insects doesn't really get us over Nagel's "What is it like to be a bat?" hurdle. Empirical information, analogies, and intuition pumps all can help, but there are fundamental judgment calls at play.
    • How to assess well being and weigh well being against survival is another example here where it's hard to boil down to numbers: there are lots of ways to do it (QALYs, seeing how much people would pay to avert various harms, natural experiments) but none is perfect and all involve their own judgment calls.
  • Some interventions require certain thresholds being met or they don't actually accomplish anything.
    • Donating to a political campaign that promises credibly to do something good might help bring about that good thing. But if the campaign fails, that donation accomplished basically nothing.
    • Existential risk mitigation efforts (as such) only do any good if they work. If the world ends anyway, that effort didn't actually accomplish anything.
    • Plenty of interventions can also backfire.
  • Some interventions aim to increase or decrease probabilities. There are a lot of ways to mess this up.
    • My least favorite arguments in intro EA messaging historically were things like: "even if the chance of [EVENT] is quite low, say 1%, then [ACTION] is extremely valuable." Anchor points like 1% may be dramatically too high.
    • In some sense almost every intervention "aims to increase or decrease probabilities", eg AMF is trying to decrease probabilities of preventable malaria deaths. But x-risk or lobbying efforts are trying to increase or decrease one big probability. This probability is generally unknown and it's very difficult to figure out how it moved after the fact (much less in advance).
    • Thinking in terms of probability is important but just making up numbers introduces lots of room for bias. Remember the replication crisis? Even strong institutional processes very focused on truth seeking (at least in theory) can produce lots of bunk for a long time before anyone notices.
  • People often make decisions about what to prioritize based on cultural and peer group effects.
    • This worsens a lot when money and social belonging are at stake. It's hard to think clearly if you're trying to land a cool job or break into a friend group. Especially for people from ages, say, 16-25.
    • My psychological state thinking about this sort of thing feels very different now that I'm a little older and do not rely on EA centrally as a peer community or source of job opportunities. If I didn't have a stable career outside of EA I'd expect my cognition to be really, really warped  by the differential probabilities of "finding an in" if I held various beliefs.
  • Outside vs. inside view debates are really hard to settle.
    • If most observers outside a community think something seems obviously false, and most people inside this community think it seems obviously true, it's difficult in either position to make progress.
    • Inside view logic often has lots of jargon, canon, and sneakily shared assumptions. Outsiders can't point these out in a satisfying way because to them it just seems like random nonsense.
    • But outside view logic treats everything in the inside view as random nonsense, and some of it often contains good insights.
  • Different reference classes also complicate outside view concerns.
    • "Most experts think X" claims are especially slippery. Who's an expert? Who's highly engaged? My "alarmingly self-reinforcing memeplex" might be your "emerging field of impressive scholarship".
    • You can see endless telescoping debates of this sort even in very tight niches within quite modest epistemic communities. Every academic field has its civil wars, with no clear resolution in sight.
    • If it's hard to settle questions even in fields with tons of high quality studies, imagine how hard it is to get a definitive answer for really big questions like "what are the odds humanity goes extinct this century". Worth asking and deeply investigating, but hard not to have an industrial salt shaker ready for whatever you find.
  • Lots more!
    • I'll end the list here because I'm running out of steam, but it isn't exhaustive.

Putting it together

I'll attempt an incomplete summary to gesture more directly at what I'm talking about.

Suppose you have a seemingly safe intervention - you're providing insecticide-coated bed nets to families in tropical areas. There have been lots of studies on efficacy, and lots of high-quality analysis on those studies. You've read some of this and come away with the belief that you can help save a life for a few thousand dollars.

But how sure are you? The research or analysis you read could be off. Your mom read somewhere that maybe the nets are used for fishing sometimes, which you don't think is a big issue. But you're not totally sure. The country in which your nets will be deployed is different than the countries where the studies were done, and plus it's been many years so maybe there have been cultural differences that change the likelihood nets will be used correctly.

And what happens once you save a life? What's the new life expectancy of that person? How high variance is it?

It becomes clear that there's a lot of value in really nailing down your intervention the best you can. Having tons of different reasons to think something will work. In this case, we've got:

  1. It's common sense that not being bit by mosquitos is nice, all else equal.
  2. The global public health community has clearly accomplished lots of good for many decades, so their recommendation is worth a lot.
  3. Lots of smart people recommend this intervention.
  4. There are strong counterarguments to all the relevant objections, and these objections are mostly shaped like "what about this edge case" rather than taking issue with the central premise.

Even if one of these fails, there are still the others. You're very likely to be doing some good, both probabilistically and in a more fuzzy, hard-to-pin-down sense.

So what?

To some degree, I think it's worth remembering that epistemology is hard for its own sake. The world is really complicated and it's easy to cache too many foundational beliefs and see long chains of inference as obvious axioms.

But also, I think this is a good argument for cause diversity and against confident pronouncements on differences in expected impact.

Wait a minute, you might say, isn't comparing interventions a huge part of effective altruism? And yes! It's really important to notice that some actions are predictably way more impactful than others. But I think this bottoms out in two ways:

  1. Commonly shared intuition
  2. Apples to apples comparisons

Commonly shared intuition

Some claims like "saving a child's life is more valuable than making a child have a deeper appreciation of the arts" pass a clear sniff test for almost all people who hear them. This kind of comparison is important for effective altruism.

Other claims are harder/less obvious, like "saving the life of ten children child across the world is more valuable than funding scholarships for ten children nearby". Most readers here would automatically agree. Not everyone would. But it's valuable to present people with lots of intuition pumps like this, to make sure we can think clearly and make good choices.

That being said, intuition is the operative concept. There's probably no objective account, or at least not one we have clear access too other than via introspection.

Apples to apples comparisons

You don't need to bottom out in intuition if you're comparing two interventions with the same target. If you want to improve global health, good to get the best bang for your buck in that arena.

If you want to save lives, you can try to save the most lives. Or you can try to save lives with the highest probability/confidence. Or some combination.

But it's good to notice that as the difference between interventions grows, comparison get messy.


It's very difficult to make pronouncements about relative impact that are both confident and accurate. Even canonical cases are really complicated under the surface.

We should take care to let newcomers find their own paths, and give out the tools to think through the principles of EA using their own assumptions about the world. More diversity is better, here. And given how hard it is to get things right, I don't think there's any shame - or even necessarily an expected value hit - in choosing to play it safe.

And speaking just from my own perspective, don't underestimate the epistemic value of having a full social life and career that do not depend on EA. You only really notice the pull of needing peer/professional approval once it's gone, and you might marvel at just how strong it was.

Sorted by Click to highlight new comments since: Today at 2:50 PM

It becomes clear that there's a lot of value in really nailing down your intervention the best you can. Having tons of different reasons to think something will work. In this case, we've got:

  1. It's common sense that not being bit by mosquitos is nice, all else equal.
  2. The global public health community has clearly accomplished lots of good for many decades, so their recommendation is worth a lot.
  3. Lots of smart people recommend this intervention.
  4. There are strong counterarguments to all the relevant objections, and these objections are mostly shaped like "what about this edge case" rather than taking issue with the central premise.

Even if one of these fails, there are still the others. You're very likely to be doing some good, both probabilistically and in a more fuzzy, hard-to-pin-down sense.


I really liked this framing, and think it could be a post on it's own! It points at something fundamental and important like "Prefer robust arguments".

You might visualize an argument as a toy structure built out of building blocks. Some kinds of arguments are structured as towers: one conclusion piled on top of another, capable of reaching tremendous heights. But: take out any one block and the whole thing comes crumbling down.

Other arguments are like those Greek temples with multiple supporting columns. They take a bit more time to build, and might not go quite as high; but are less reliant on one particular column to hold its entire weight. I call such arguments "robust".

One example of a robust argument that I particularly liked: the case for cutting meat out of your diet. You can make a pretty good argument for it from a bunch of different angles:

  • Animal suffering
  • Climate/reducing emissions
  • Health and longevity
  • Financial cost (price of food)

By preferring robustness, you are more likely to avoid Pascalian muggings, more likely to work on true and important areas, more likely to have your epistemic failures be graceful.

Some signs that an argument is robust:

  • Many people who think hard about this issue agree
  • People with very different backgrounds agree
  • The argument does a good job predicting past results across a lot of different areas

Robustness isn't the only, or even main, quality of an argument; there are some conclusions you can only reach by standing atop a tall tower! Longtermism feels shaped this way to me. But also, this suggests that you can do valuable work by shoring up the foundations and assumptions that are implicit in a tower-like argument, eg by red-teaming the assumption that future people are likely to exist conditional on us doing a good job.

Yeah! This was the actually the first post I tried to write. But it petered out a few times, so I approached it from a different angle and came up with the post above instead. I definitely agree that "robustness" is something that should be seen as a pillar of EA - boringly overdetermined interventions just seem a lot more likely to survive repeated contact with reality to me, and I think as we've moved away from geeking out about RCTs we've lost some of that caution as a communtiy.


There's an excellent old GiveWell blogpost by Holden Karnofsky on this topic called Sequence Thinking vs Cluster Thinking:

  • Sequence thinking involves making a decision based on a single model of the world: breaking down the decision into a set of key questions, taking one’s best guess on each question, and accepting the conclusion that is implied by the set of best guesses (an excellent example of this sort of thinking is Robin Hanson’s discussion of cryonics). It has the form: “A, and B, and C … and N; therefore X.” Sequence thinking has the advantage of making one’s assumptions and beliefs highly transparent, and as such it is often associated with finding ways to make counterintuitive comparisons.
  • Cluster thinking – generally the more common kind of thinking – involves approaching a decision from multiple perspectives (which might also be called “mental models”), observing which decision would be implied by each perspective, and weighing the perspectives in order to arrive at a final decision. Cluster thinking has the form: “Perspective 1 implies X; perspective 2 implies not-X; perspective 3 implies X; … therefore, weighing these different perspectives and taking into account how much uncertainty I have about each, X.” Each perspective might represent a relatively crude or limited pattern-match (e.g., “This plan seems similar to other plans that have had bad results”), or a highly complex model; the different perspectives are combined by weighing their conclusions against each other, rather than by constructing a single unified model that tries to account for all available information.

A key difference with “sequence thinking” is the handling of certainty/robustness (by which I mean the opposite of Knightian uncertainty) associated with each perspective. Perspectives associated with high uncertainty are in some sense “sandboxed” in cluster thinking: they are stopped from carrying strong weight in the final decision, even when such perspectives involve extreme claims (e.g., a low-certainty argument that “animal welfare is 100,000x as promising a cause as global poverty” receives no more weight than if it were an argument that “animal welfare is 10x as promising a cause as global poverty”).

Holden also linked other writing heavily overlapping with this idea:

Before I continue, I wish to note that I make no claim to originality in the ideas advanced here. There is substantial overlap with the concepts of foxes and hedgehogs (discussed by Philip Tetlock); with the model and combination and adjustment idea described by Luke Muehlhauser; with former GiveWell employee Jonah Sinick’s concept of many weak arguments vs. one relatively strong argument (and his post on Knightian uncertainty from a Bayesian perspective); with former GiveWell employee Nick Beckstead’s concept of common sense as a prior; with Brian Tomasik’s thoughts on cost-effectiveness in an uncertain world; with Paul Christiano’s Beware Brittle Arguments post; and probably much more.

Haha thanks for pointing this out! I'm glad this isn't an original idea; you might say robustness itself is pretty robust ;)

But there's no objectively rigorous way to decide who gets how much of the credit!

Why are you talking about "credit" at all?  This is a confused concept.  See sec 3.3.1 of Parfit's Ethics:

According to the Share-of-the-Total view, when a group collectively brings
about some outcome, each member counts as producing their “share” of the
total. For example, if 5 people work together to save 100 lives, each participant
is credited with saving 20 lives. But if our moral decision-making were guided
by this kind of accounting procedure, it could lead to foolish decisions with
obviously detrimental results, such as:

(a) unnecessarily joining a group of benefactors (who together save 100 lives)
who could do just as well without you, when you could instead have saved
10 additional lives independently, or

(b) single-handedly saving 50 lives instead of joining a group that needs you in
order to save 100.

As these cases demonstrate, it does not really matter what “share of the total”
gets attributed to you on the basis of the group that you join (as though group
size were inherently morally significant). What matters is just performing the
act, of those available to you, that results in the most lives being saved (or, more
generally, the most good being done), in total. In case (a), you can bring it about
that 110 lives are saved, rather than just 100, if you act independently. In case
(b), you can bring it about that 100 lives are saved, rather than just 50, if you
contribute to the group. These are the numbers that matter. No moral insight is
gained by dividing any of these numbers by the contributing group size to yield
some kind of agential “share”. To think otherwise, Parfit argues, is simply
a mistake.

Yes, I agree it's a confused concept. But I think that same confused concept gets smuggled into conversations about "impact" quite often. 

It's also relevant for coordination: any time you can be the 100th person that joins a group of 100 that suddenly is able to save lots of lives, there first must have been 99 people who coordinated on the bet they'd be able to get you or someone like you. But how did they make that bet?

Yep. If you want to retrospectively compensate people for doing some good action, though, you might want to try to reward people in proportion to their "contribution".

Wouldn't that incentivize bad choices like (a) and (b)?

Fwiw, there's been some discussion on how to attribute impact to individual agents; e.g. here. I'm not read up on these issues, though, and couldn't say how participants in that debate would respond to your line of criticism.

Interesting, thanks.  Note that the top-rated comment there is Toby Ord making just this Parfitian line of criticism.

If you compensate according to share-of-the-total, then yes. 

If you pay everyone according to the their impact vs the case where they did nothing, then no, but you have a different problem. Suppose, for example, you want to reward a firing squad who have killed Hitler. Without any one of the shooters, the others would still have shot Hitler. So none of them can claim any counterfactual impact. But surely they should (collectively, if nothing else), be able to claim a reward.

So there is at least a practical question, of what procedure to use.

Another issue that makes it hard to evaluate global health interventions is the indirect effects of NGOs in countries far from the funders. For example this book made what I found to be a compelling argument that many NGOs in Africa are essentially funding civil war, via taxes or the replacement of government expenditure:


African politics are pretty far outside my field of expertise, but the magnitudes seem quite large. War in the Congo alone has killed millions of people over the past couple decades.

I don’t really know how to make a tradeoff here but I wish other people more knowledgeable about African politics would dig into it.

More from Justis
Curated and popular this week
Relevant opportunities