(Content warning: this post mentions a question from the 2024 EA Survey. If you haven't answered it yet and plan to do so, please do that first before reading on)
The 2024 EA survey asks people which of the following interventions they prefer:
- An intervention that averts 1,000 DALYs with 100% probability
- An intervention that averts 100,000 DALYs with 1.5% probability
This is a simple question in theory: (2) has 50% more expected value.
In practice, I believe this is an absurd premise, the kind that never happens in real life. How would you know that the probability that an intervention works is 1.5%?
My rule of thumb is that most real-world probabilities could be off by a percentage point or so. Note that this is different from it being 1% too low or too high; it is an entire percentage point. For the survey question, it might well be that intervention (1)'s success rate is only 99%, and intervention (2) could have a success rate anywhere in the low percentages.
I don't have a good justification for this rule of thumb[1]. Part of it is probably psychological: humans are most familiar with concepts like "rare". We occasionally use percentages but rarely (no pun intended) use permilles or smaller units. Parts of it is technical: probabilities that are small are harder to directly measure, so they are derived from a model. The model is imperfect, and the model inputs are likely to be imprecise.
For intervention (1), my rule of thumb does not have a large effect on the overall impact. For intervention (2), the effect is very large[2]. This makes the survey question so hard to answer, and the answers so hard to interpret.
There are, of course, established ways to deal with this mathematically. For example, one could use a portfolio approach that allocates some fraction of resources to intervention (2). Such strategies are valuable, even necessary, to deal with this type of question. As a survey respondent, I felt frustrated with having just two options. I feel that the survey question creates a false sense of "all you need is expected value"; it asks for a black-and-white answer where the reality has lots of shades.[3]
My recommendation and plea: Please communicate humbly, especially when using very low probabilities. Consider that all your numbers, but low probabilities especially, might be inaccurate. When designing thought experiments, keep them as realistic as possible, so that they elicit better answers. This reduces misunderstandings, pitfalls, and potentially compounding errors. It produces better communication overall.
- I welcome pointers to research about this! ↩︎
- The effect is large, in the sense that the expected intervention value could be 500 DALYs or 2500 DALYs. However, the expected expected intervention value does not change if we just add symmetric error margins. ↩︎
- Caveat: I don't know what the survey question was intended to measure. It might well be a good question, given its goal. ↩︎
Thanks for the thoughtful response.
On (1) I'm not really sure the uncertainty and the trust in the estimate are separable. A probability estimate of a nonrecurring event[1] fundamentally is a label someone[2] applies to how confident they are something will happen. A corollary of this is that you should probably take into account how probability estimates could have actually been reached, your trust in that reasoning and the likelihood of bias when deciding how to act. [3]
On (2) I agree with your comments about the OP's point; if the probabilities are +/-1 percentage point with error symmetrically distributed they're still on average 1.5%[4], though in some circumstances introducing error bars might affect how you handle risk. But as I've said, I don't think the distribution of errors looks like this when it comes to assessing whether long shots are worth pursuing or not (not even under the assumption of good faith). I'd be pretty worried if hits based grant-makers didn't, frankly, and this question puts me in their shoes.
Your point about analytic philosophy often expecting literal answers to slightly weird hypotheticals is a good one. But EA isn't just analytic philosophy and St Petersburg Paradoxes, it's also people literally coming up with best guesses of probabilities of things they think might work and multiplying them (and a whole subculture based on that, and guesstimating just how impactful "crazy train" long shot ideas they're curious about might be). So I think it's pretty reasonable to treat it not as a slightly daft hypothetical where a 1.5% probability is an empirical reality,[5] but as a real world decision grant award scenario where the "1.5% probability" is a suspiciously precise credence, and you've got to decide whether to trust it enough to fund it over something that definitely works. In that situation, I think I'm discounting the estimated chance of success of the long shot by more than 50%.
FWIW I don't take the question as evidence the survey designers are biased in any way
"this will either avert 100,000 DALYs or have no effect" doesn't feel like a proposition based on well-evidenced statistical regularities...
not me. Or at least a "1.5%" chance of working for thousands of people and implicitly a 98.5% chance of having no effect on anyone certainly doesn't feel like the sort of degree of precision I'd estimate to...
Whilst it's the unintended consequences of how the question was framed, this example feels particularly fishy. We're asked to contemplate trading off something that certainly will work against something potentially higher yielding that is highly unlikely to work, and yet the thing that is highly unlikely to work turns out to have the higher EV because someone has speculated on its likelihood to a very high degree of precision, and those extra 5 thousandths made all the difference. What's the chance the latter estimate is completely bogus or finessed to favour the latter option? I'd say in real world scenarios (and certainly not just EA scenarios) it's quite a bit more than 5 in 1000....
that one's a math test too ;-)
maybe a universe where physics is a god with an RNG...