Hide table of contents

I write this post as a work-in-process guide for thinking about different kinds of uncertainty in the context of making a cost-effectiveness analysis (CEA) model. I'm publishing this as-is to check if it is generally understood and if it seems important and relevant, and if there'd be interest I'll write a more complete and polished post on the topic. 

When estimating a variable, we have both "epistemic" and "statistical" (or, "aleatoric") uncertainty about its value. Statistical uncertainty can be thought of as the inherent randomness involved (e.g. the number of heads in 10 coin flips). Epistemic uncertainty, however, is due to our lack of knowledge about it (e.g. how many coin flips in total people did yesterday?).

For example, in a randomized controlled trial (RCT) we try to estimate the effect of a treatment on some outcome variable. That effect is usually dependent on many particular factors, say: the financial and cultural aspects of the population involved, the time of day the treatment was administered, the prevalence of a specific disease in a particular location, etc. With unlimited knowledge, we could account for all of these factors and estimate the effect as a function of them. However, as this isn't possible, we can instead try to model the effect as a random variable with some statistical uncertainty.

If we conduct that RCT well, have a large sample size, and draw uniformly from our target population, then we can find a good fit for the distribution of the effect which can then be used to predict the effect of a future large-scale program over the same population. In this case, we have very little epistemic uncertainty about the effect, but the statistical uncertainty is still present and irreducible.

If we tried to apply the results of that RCT to a different population, then we would have to account for the epistemic uncertainty as well. For example, say we conducted the RCT in a rural area of a developing country, and we wanted to apply the results to an urban area of that same country, then we would have to make some educated guesses about the effect of the treatment in the new population. This is in epistemic uncertainty territory.

Relevance to CEAs

The way Guesstimate, Squiggle, Dagger and similar programs work is called a Monte Carlo simulation and involves running the computation many times, where each input variable is sampled randomly according to the given distribution. This gives us many samples for each variable in our computation, which we can think of as approximating its distribution. Now, we care about these distributions and not just their expected value because that is a way to quantify and show the uncertainty involved in the computation. That uncertainty we want to express is generally the epistemic uncertainty about the expected cost-effectiveness. 

Examples:

  • If we think charities generally have a 60% chance of success, it’s a  type of statistical uncertainty. We wouldn’t want to model that as a Bernoulli distribution but rather use that 60% directly as a constant multiplicative factor. Or better yet, use a beta distribution instead of a constant to express to what extent we are not sure about the exact chance of success. 
  • Say our proposed charity to promote regulating hat sizes in summer is only cost-effective if there are no existing regulations in place, but we haven’t yet put the time to check on that. In this case, we are epistemically uncertain and we should use a Bernoulli distribution.
  • We find a paper that has conducted an RCT or a meta-analysis on an intervention of interest and found an average effect size of 2.3 with 90%CI of [1.8, 2.9]. When we use that variable in our calculation, what should our uncertainty estimate be?
    • First, while the effect size itself is definitely a random variable that’s statistically uncertain, we can think of its average effect as an epistemically unknown constant. So in our model, we should generally use the average effect rather than model how the effect is distributed across individuals.
    • Second, and a bit of a tangent, the definition of a confidence interval in frequentist statistics is weird. In such a setting, we assume that there is a true number which is the actual average effect size, and then we have some process to estimate it from random samples drawn from some actual random process, but we don’t know the parameters at all. The confidence interval is computed from these samples by some process, so that if we run this process again and again, over independent sets of samples, we get many intervals roughly 90% of them would include the actual parameter.
      • This is different from how we usually think of our credible intervals - as expressing the belief that the true result has a 90% chance of being inside our interval. 
      • I’m not yet sure how to best handle this, but it seems like in many cases you can expect the authors to have come up with a similar looking credible interval. 
    • Third, this doesn’t take into account external validity (and only partially addresses internal validity). I’m not yet sure about the best approach for representing external/internal validity in the model as epistemic uncertainty. 
  • Generally, we have epistemic uncertainty around summary statistics, which themselves are aggregations of statistical uncertainty.

Relevant links

  • Wikipedia page
  • Paper on the topic in the context of ERAs and bayesian networks - link.
  • Examinations on modeling uncertainty in CEAs that are interesting and have links to more fun stuff 1, 2, 3, 4.
Comments2


Sorted by Click to highlight new comments since:

I think that such work on fundamental tools is very important for improving the EA toolkit - thank you Edo!

Executive summary: The post discusses different types of uncertainty in cost-effectiveness analyses (CEAs), specifically epistemic uncertainty due to lack of knowledge vs. statistical uncertainty inherent to the data.

Key points:

  1. Epistemic uncertainty arises when generalizing results to new contexts, while statistical uncertainty is irreducible randomness.
  2. Monte Carlo simulation is used in CEAs to quantify uncertainty by sampling input variables.
  3. Confidence intervals express statistical uncertainty, while credible intervals represent epistemic uncertainty.
  4. Care is needed when using summary statistics like effect sizes in models, as they contain both types of uncertainty.
  5. Modeling external validity as epistemic uncertainty is an open challenge.
  6. Overall, epistemic uncertainty around averages aggregates statistical uncertainty within data.

 

This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.

More from EdoArad
124
EdoArad
· · 1m read
55
EdoArad
· · 1m read
Curated and popular this week
 ·  · 12m read
 · 
Economic growth is a unique field, because it is relevant to both the global development side of EA and the AI side of EA. Global development policy can be informed by models that offer helpful diagnostics into the drivers of growth, while growth models can also inform us about how AI progress will affect society. My friend asked me to create a growth theory reading list for an average EA who is interested in applying growth theory to EA concerns. This is my list. (It's shorter and more balanced between AI/GHD than this list) I hope it helps anyone who wants to dig into growth questions themselves. These papers require a fair amount of mathematical maturity. If you don't feel confident about your math, I encourage you to start with Jones 2016 to get a really strong grounding in the facts of growth, with some explanations in words for how growth economists think about fitting them into theories. Basics of growth These two papers cover the foundations of growth theory. They aren't strictly essential for understanding the other papers, but they're helpful and likely where you should start if you have no background in growth. Jones 2016 Sociologically, growth theory is all about finding facts that beg to be explained. For half a century, growth theory was almost singularly oriented around explaining the "Kaldor facts" of growth. These facts organize what theories are entertained, even though they cannot actually validate a theory – after all, a totally incorrect theory could arrive at the right answer by chance. In this way, growth theorists are engaged in detective work; they try to piece together the stories that make sense given the facts, making leaps when they have to. This places the facts of growth squarely in the center of theorizing, and Jones 2016 is the most comprehensive treatment of those facts, with accessible descriptions of how growth models try to represent those facts. You will notice that I recommend more than a few papers by Chad Jones in this
LintzA
 ·  · 15m read
 · 
Introduction Several developments over the past few months should cause you to re-evaluate what you are doing. These include: 1. Updates toward short timelines 2. The Trump presidency 3. The o1 (inference-time compute scaling) paradigm 4. Deepseek 5. Stargate/AI datacenter spending 6. Increased internal deployment 7. Absence of AI x-risk/safety considerations in mainstream AI discourse Taken together, these are enough to render many existing AI governance strategies obsolete (and probably some technical safety strategies too). There's a good chance we're entering crunch time and that should absolutely affect your theory of change and what you plan to work on. In this piece I try to give a quick summary of these developments and think through the broader implications these have for AI safety. At the end of the piece I give some quick initial thoughts on how these developments affect what safety-concerned folks should be prioritizing. These are early days and I expect many of my takes will shift, look forward to discussing in the comments!  Implications of recent developments Updates toward short timelines There’s general agreement that timelines are likely to be far shorter than most expected. Both Sam Altman and Dario Amodei have recently said they expect AGI within the next 3 years. Anecdotally, nearly everyone I know or have heard of who was expecting longer timelines has updated significantly toward short timelines (<5 years). E.g. Ajeya’s median estimate is that 99% of fully-remote jobs will be automatable in roughly 6-8 years, 5+ years earlier than her 2023 estimate. On a quick look, prediction markets seem to have shifted to short timelines (e.g. Metaculus[1] & Manifold appear to have roughly 2030 median timelines to AGI, though haven’t moved dramatically in recent months). We’ve consistently seen performance on benchmarks far exceed what most predicted. Most recently, Epoch was surprised to see OpenAI’s o3 model achieve 25% on its Frontier Math
Omnizoid
 ·  · 5m read
 · 
Edit 1/29: Funding is back, baby!  Crossposted from my blog.   (This could end up being the most important thing I’ve ever written. Please like and restack it—if you have a big blog, please write about it). A mother holds her sick baby to her chest. She knows he doesn’t have long to live. She hears him coughing—those body-wracking coughs—that expel mucus and phlegm, leaving him desperately gasping for air. He is just a few months old. And yet that’s how old he will be when he dies. The aforementioned scene is likely to become increasingly common in the coming years. Fortunately, there is still hope. Trump recently signed an executive order shutting off almost all foreign aid. Most terrifyingly, this included shutting off the PEPFAR program—the single most successful foreign aid program in my lifetime. PEPFAR provides treatment and prevention of HIV and AIDS—it has saved about 25 million people since its implementation in 2001, despite only taking less than 0.1% of the federal budget. Every single day that it is operative, PEPFAR supports: > * More than 222,000 people on treatment in the program collecting ARVs to stay healthy; > * More than 224,000 HIV tests, newly diagnosing 4,374 people with HIV – 10% of whom are pregnant women attending antenatal clinic visits; > * Services for 17,695 orphans and vulnerable children impacted by HIV; > * 7,163 cervical cancer screenings, newly diagnosing 363 women with cervical cancer or pre-cancerous lesions, and treating 324 women with positive cervical cancer results; > * Care and support for 3,618 women experiencing gender-based violence, including 779 women who experienced sexual violence. The most important thing PEPFAR does is provide life-saving anti-retroviral treatments to millions of victims of HIV. More than 20 million people living with HIV globally depend on daily anti-retrovirals, including over half a million children. These children, facing a deadly illness in desperately poor countries, are now going