Uncertainty and sensitivity analyses of GiveWell's cost-effectiveness analyses

cole_haus

(The same content is broken up into three posts and given a very slightly different presentation on my blog.)

Overview

GiveWell models the cost-effectiveness of its top charities. Because the input parameters are uncertain (How much moral weight should we give to increasing consumption? What is the current income of a typical GiveDirectly recipient?), the resulting cost-effectiveness estimates are also fundamentally uncertain. By performing uncertainty analysis, we get a better sense of just how uncertain the results are. Uncertainty analysis is also the first step on the route to sensitivity analysis. Sensitivity analysis reveals which input parameters each charity's cost-effectiveness estimate is most sensitive to. That kind of information helps us target future investigations (i.e. uncertainty reduction). The final step is to combine the individual charity cost-effectiveness estimates into one giant model. By performing uncertainty and sensitivity analysis on this giant model, we get a better sense of which input parameters have the most influence on the relative cost-effectiveness of GiveWell's top charities—i.e. how the charities rank against each other.

A key feature of the analysis outlined above and performed below is that it requires the analyst to specify their uncertainty over each input parameter. Because I didn't want all of the results here to reflect my idiosyncratic beliefs, I instead pretended that each input parameter is equally uncertain. This makes the results "neutral" in a certain sense, but it also means that they don't reveal much about the real world. To achieve real insight, you need to adjust the input parameters to match your beliefs. You can do that by heading over to the Jupyter notebook, editing the parameters in the second cell, and clicking "Runtime > Run all". This limitation means that all the ensuing discussion is more akin to an analysis template than a true analysis.

Uncertainty analysis of GiveWell's cost-effectiveness estimates

Section overview

GiveWell produces cost-effectiveness models of its top charities. These models take as inputs many uncertain parameters. Instead of representing those uncertain parameters with point estimates—as the cost-effectiveness analysis spreadsheet does—we can (should) represent them with probability distributions. Feeding probability distributions into the models allows us to output explicit probability distributions on the cost-effectiveness of each charity.

GiveWell's cost-effectiveness analysis

GiveWell, an in-depth charity evaluator, makes their detailed spreadsheets models available for public review. These spreadsheets estimate the value per dollar of donations to their 8 top charities: GiveDirectly, Deworm the World, Schistosomiasis Control Initiative, Sightsavers, Against Malaria Foundation, Malaria Consortium, Helen Keller International, and the END Fund. For each charity, a model is constructed taking input values to an estimated value per dollar of donation to that charity. The inputs to these models vary from parameters like "malaria prevalence in areas where AMF operates" to "value assigned to averting the death of an individual under 5".

Helpfully, GiveWell isolates the input parameters it deems as most uncertain. These can be found in the "User inputs" and "Moral weights" tabs of their spreadsheet. Outsiders interested in the top charities can reuse GiveWell's model but supply their own perspective by adjusting the values of the parameters in these tabs.

For example, if I go to the "Moral weights" tab and run the calculation with a 0.1 value for doubling consumption for one person for one year—instead of the default value of 1—I see the effect of this modification on the final results: deworming charities look much less effective since their primary effect is on income.

Uncertain inputs

GiveWell provides the ability to adjust these input parameters and observe altered output because the inputs are fundamentally uncertain. But our uncertainty means that picking any particular value as input for the calculation misrepresents our state of knowledge. From a subjective Bayesian point of view, the best way to represent our state of knowledge on the input parameters is with a probability distribution over the values the parameter could take. For example, I could say that a negative value for increasing consumption seems very improbable to me but that a wide range of positive values seem about equally plausible. Once we specify a probability distribution, we can feed these distributions into the model and, in principle, we'll end up with a probability distribution over our results. This probability distribution on the results helps us understand the uncertainty contained in our estimates and how literally we should take them.

Is this really necessary?

Perhaps that sounds complicated. How are we supposed to multiply, add and otherwise manipulate arbitrary probability distributions in the way our models require? Can we somehow reduce our uncertain beliefs about the input parameters to point estimates and run the calculation on those? One candidate is to take the single most likely value of each input and using that value in our calculations. This is the approach the current cost-effectiveness analysis takes (assuming you provide input values selected in this way). Unfortunately, the output of running the model on these inputs is necessarily a point value and gives no information about the uncertainty of the results. Because the results are probably highly uncertain, losing this information and being unable to talk about the uncertainty of the results is a major loss. A second possibility is to take lower bounds on the input parameters and run the calculation on these values, and to take the upper bounds on the input parameters and run the calculation on these values. This will produce two bounding values on our results, but it's hard to give them a useful meaning. If the lower and upper bounds on our inputs describe, for example, a 95% confidence interval, the lower and upper bounds on the result don't (usually) describe a 95% confidence interval.

Computers are nice

If we had to proceed analytically, working with probability distributions throughout, the model would indeed be troublesome and we might have to settle for one of the above approaches. But we live in the future. We can use computers and Monte Carlo methods to numerically approximate the results of working with probability distributions while leaving our models clean and unconcerned with these probabilistic details. Guesstimate is a tool you may have heard of that works along these lines and bills itself as "A spreadsheet for things that aren’t certain".

Analysis

We have the beginnings of a plan then. We can implement GiveWell's cost-effectiveness models in a Monte Carlo framework (PyMC3 in this case), specify probability distributions over the input parameters, and finally run the calculation and look at the uncertainty that's been propagated to the results.

Model

The Python source code implementing GiveWell's models can be found on GitHub^[1]. The core models can be found in cash.py, nets.py, smc.py, worms.py and vas.py.

Inputs

For the purposes of the uncertainty analysis that follows, it doesn't make much sense to infect the results with my own idiosyncratic views on the appropriate value of the input parameters. Instead, what I have done is uniformly taken GiveWell's best guess and added and subtracted 20%. These upper and lower bounds then become the 90% confidence interval of a log-normal distribution^[2]. For example, if GiveWell's best guess for a parameter is 0.1, I used a log-normal with a 90% CI from 0.08 to 0.12.

While this approach screens off my influence it also means that the results of the analysis will primarily tell us about the structure of the computation rather than informing us about the world. Fortunately, there's a remedy for this problem too. I have set up a Jupyter notebook^[3] with the all the input parameters to the calculation which you can manipulate and rerun the analysis. That is, if you think the moral weight given to increasing consumption ought to range from 0.8 to 1.5 instead of 0.8 to 1.2, you can make that edit and see the corresponding results. Making these modifications is essential for a realistic analysis because we are not, in fact, equally uncertain about every input parameter.

It's also worth noting that I have considerably expanded the set of input parameters receiving special scrutiny. The GiveWell cost-effectiveness analysis is (with good reason—it keeps things manageable for outside users) fairly conservative about which parameters it highlights as eligible for user manipulation. In this analysis, I include any input parameter which is not tautologically certain. For example, "Reduction in malaria incidence for children under 5 (from Lengeler 2004 meta-analysis)" shows up in the analysis which follows but is not highlighted in GiveWell's "User inputs" or "Moral weights" tab. Even though we don't have much information with which to second guess the meta-analysis, the value it reports is still uncertain and our calculation ought to reflect that.

Results

Finally, we get to the part that you actually care about, dear reader: the results. Given input parameters which are each distributed log-normally with a 90% confidence interval spanning ±20% of GiveWell's best estimate, here are the resulting uncertainties in the cost-effectiveness estimates:

Probability distributions of value per dollar for GiveWell's top charities

Probability distributions of value per dollar for GiveWell's top charities

For reference, here are the point estimates of value per dollar using GiveWell's values for the charities:

GiveWell's cost-effectiveness estimates for its top charities

Charity	Value per dollar
GiveDirectly	0.0038
The END Fund	0.0222
Deworm the World	0.0738
Schistosomiasis Control Initiative	0.0378
Sightsavers	0.0394
Malaria Consortium	0.0326
Helen Keller International	0.0223
Against Malaria Foundation	0.0247

I've also plotted a version in which the results are normalized—I divided the results for each charity by that charity's expected value per dollar. Instead of showing the probability distribution on the value per dollar for each charity, this normalized version shows the probability distribution on the percentage of that charity's expected value that it achieves. This version of the plot abstracts from the actual value per dollar and emphasizes the spread of uncertainty. It also reëmphasizes the earlier point that—because we use the same spread of uncertainty for each input parameter—the current results are telling us more about the structure of the model than about the world. For real results, go try the Jupyter notebook!

Probability distributions for percentage of expected value obtained with each of GiveWell's top charities

Probability distributions for percentage of expected value obtained with each of GiveWell's top charities

Section recap

Our preliminary conclusion is that all of GiveWell's top charities cost-effectiveness estimates have similar uncertainty with GiveDirectly being a bit more certain than the rest. However, this is mostly an artifact of pretending that we are exactly equally uncertain about each input parameter.

Sensitivity analysis of GiveWell's cost effectiveness estimates

Section overview

In the previous section, we introduced GiveWell's cost-effectiveness analysis which uses a spreadsheet model to take point estimates of uncertain input parameters to point estimates of uncertain results. We adjusted this approach to take probability distributions on the input parameters and in exchange got probability distributions on the resulting cost-effectiveness estimates. But this machinery lets us do more. Now that we've completed an uncertainty analysis, we can move on to sensitivity analysis.

The basic idea of sensitivity analysis is, when working with uncertain values, to see which input values most affect the output when they vary. For example, if you have the equation $f (a, b) = 2^{a} + b$ and each of $a$ and $b$ varies uniformly over the range from 5 to 10, $f (a, b)$ is much more sensitive to $a$ then $b$ . A sensitivity analysis is practically useful in that it can offer you guidance as to which parameters in your model it would be most useful to investigate further (i.e. to narrow their uncertainty).

Visual (scatter plot) and delta moment-independent sensitivity analysis on GiveWell's cost-effectiveness models show which input parameters the cost-effectiveness estimates are most sensitive to. Preliminary results (given our input uncertainty) show that some input parameters are much more influential on the final cost-effectiveness estimates for each charity than others.

Visual sensitivity analysis

The first kind of sensitivity analysis we'll run is just to look at scatter plots comparing each input parameter to the final cost-effectiveness estimates. We can imagine these scatter plots as the result of running the following procedure many times^[4]: sample a single value from the probability distribution for each input parameter and run the calculation on these values to determine a result value. If we repeat this procedure enough times, it starts to approximate the true values of the probability distributions.

(One nice feature of this sort of analysis is that we see how the output depends on a particular input even in the face of variations in all the other inputs—we don't hold everything else constant. In other words, this is a global sensitivity analysis.)

(Caveat: We are again pretending that we are equally uncertain about each input parameter and the results reflect this limitation. To see the analysis result for different input uncertainties, edit and run the Jupyter notebook.)

Direct cash transfers

GiveDirectly

Scatter plots showing sensitivity of GiveDirectly's cost-effectiveness to each input parameter

Scatter plots showing sensitivity of GiveDirectly's cost-effectiveness to each input parameter

The scatter plots show that, given our choice of input uncertainty, the output is most sensitive (i.e. the scatter plot for these parameters shows the greatest directionality) to the input parameters:

Highlighted input factors to which result is highly sensitive

Input	Type of uncertainty	Meaning/importance
value of increasing ln consumption per capita per annum	Moral	Determines final conversion between empirical outcomes and value
transfer as percent of total cost	Operational	Determines cost of results
return on investment	Opportunities available to recipients	Determines stream of consumption over time
baseline consumption per capita	Empirical	Diminishing marginal returns to consumption mean that baseline consumption matters

Deworming

Some useful and non-obvious context for the following is that the primary putative benefit of deworming is increased income later in life.

The END Fund

Scatter plots showing sensitivity of the END Fund's cost-effectiveness to each input parameter

Scatter plots showing sensitivity of the END Fund's cost-effectiveness to each input parameter

Here, it's a little harder to identify certain factors as more important. It seems that the final estimate is (given our input uncertainty) the result of many factors of medium effect. It does seem plausible that the output is somewhat less sensitive to these factors:

Highlighted input factors to which result is minimally sensitive

Input	Type of uncertainty	Meaning/(un)importance
num yrs between deworming and benefits	Forecast	Affects how much discounting of future income streams must be done
duration of long-term benefits	Forecast	The length of time for a which a person works and earns income
expected value from leverage and funging	Game theoretic	How much does money donated to the END Fund shift around other money

Deworm the World

Scatter plots showing sensitivity of Deworm the World's cost-effectiveness to each input parameter

Scatter plots showing sensitivity of the Deworm the World's cost-effectiveness to each input parameter

Again, it's a little harder to identify certain factors as more important. It seems that the final estimate is (given our input uncertainty) the result of many factors of medium effect. It does seem plausible that the output is somewhat less sensitive to these factors:

Highlighted input factors to which result is minimally sensitive

Input	Type of uncertainty	Meaning/(un)importance
num yrs between deworming and benefits	Forecast	Affects how much discounting of future income streams must be done
duration of long-term benefits	Forecast	The length of time for a which a person works and earns income
expected value from leverage and funging	Game theoretic	How much does money donated to Deworm the World shift around other money

Schistosomiasis Control Initiative

Scatter plots showing sensitivity of Schistosomiasis Control Initiative's cost-effectiveness to each input parameter

Scatter plots showing sensitivity of the Schistosomiasis Control Initiative's cost-effectiveness to each input parameter

Highlighted input factors to which result is minimally sensitive

Input	Type of uncertainty	Meaning/(un)importance
num yrs between deworming and benefits	Forecast	Affects how much discounting of future income streams must be done
duration of long-term benefits	Forecast	The length of time for a which a person works and earns income
expected value from leverage and funging	Game theoretic	How much does money donated to Schistosomiasis Control Initiative shift around other money

Sightsavers

Scatter plots showing sensitivity of Sightsavers' cost-effectiveness to each input parameter

Scatter plots showing sensitivity of the Sightsavers' cost-effectiveness to each input parameter

Highlighted input factors to which result is minimally sensitive

Input	Type of uncertainty	Meaning/(un)importance
num yrs between deworming and benefits	Forecast	Affects how much discounting of future income streams must be done
duration of long-term benefits	Forecast	The length of time for a which a person works and earns income
expected value from leverage and funging	Game theoretic	How much does money donated to Sightsavers shift around other money

Seasonal malaria chemoprevention

Malaria Consortium

Scatter plots showing sensitivity of Malaria Consortium's cost-effectiveness to each input parameter

Scatter plots showing sensitivity of Malaria Consortium's cost-effectiveness to each input parameter

Highlighted input factors to which result is highly sensitive

Input	Type of uncertainty	Meaning/importance
direct mortality in high transmission season	Empirical	Fraction of overall malaria mortality during the peak transmission season and amenable to SMC
internal validity adjustment	Methodological	How much do we trust the results of the underlying SMC studies
external validity adjustment	Methodological	How much do the results of the underlying SMC studies transfer to new settings
coverage in trials in meta-analysis	Historical/methodological	Determines how much coverage an SMC program needs to achieve to match studies
value of averting death of a young child	Moral	Determines final conversion between empirical outcomes and value
cost per child targeted	Operational	Affects cost of results

Vitamin A supplementation

Helen Keller International

Scatter plots showing sensitivity of Helen Keller International's cost-effectiveness to each input parameter

Scatter plots showing sensitivity of the Helen Keller International's cost-effectiveness to each input parameter

The scatter plots show that, given our choice of input uncertainty, the output is most sensitive to the input parameters:

Highlighted input factors to which result is highly sensitive

Input	Type of uncertainty	Meaning/importance
relative risk of all-cause mortality for young children in programs	Causal	How much do VAS programs affect mortality
cost per child per round	Operational	Affects cost of results
rounds per year	Operational	Affects cost of results

Bednets

Against Malaria Foundation

Scatter plots showing sensitivity of Against Malaria Foundation's cost-effectiveness to each input parameter

Scatter plots showing sensitivity of Against Malaria Foundation's cost-effectiveness to each input parameter

Highlighted input factors to which result is highly sensitive

Input	Type of uncertainty	Meaning/importance
num LLINs distributed per person	Operational	Affects cost of results
cost per LLIN	Operational	Affects cost of results
deaths averted per protected child under 5	Causal	How effective is the core activity
lifespan of an LLIN	Empirical	Determines how many years of benefit accrue to each distribution
net use adjustment	Empirical	Determines benefits from LLIN as mediated by proper and improper use
internal validity adjustment	Methodological	How much do we trust the results of the underlying studies
percent of mortality due to malaria in AMF areas vs trials	Empirical/historical	Affects size of the problem
percent of pop. under 5	Empirical	Affects size of the problem

Delta moment-independent sensitivity analysis

If eyeballing plots seems a bit unsatisfying to you as a method for judging sensitivity, not to worry. We also have the results of a more formal sensitivity analysis. This method is called delta moment-independent sensitivity analysis.

$δ_{i}$ (the delta moment-independent sensitivity indicator of parameter $i$ ) "represents the normalized expected shift in the distribution of [the output] provoked by [that input]". To make this meaning more explicit, we'll start with some notation/definitions. Let:

$X = (X_{1}, X_{2}, \dots, X_{n}) \in R^{n}$ be the random variables used as input parameters
$Y = f (X)$ so that $f (X)$ is a function from $R^{n}$ to $R$ describing the relationship between inputs and outputs—i.e. GiveWell's cost-effectiveness model
$f_{Y} (y)$ be the density function of the result $Y$ —i.e. the probability distributions we've already seen showing the cost-effectiveness for each charity
$f_{Y | X_{i}} (y)$ be the conditional density of Y with one of the parameters $X_{i}$ fixed—i.e. a probability distribution for the cost-effectiveness of a charity while pretending that we know one of the input values precisely

With these in place, we can define $δ_{i}$ . It is:

$δ_{i} = \frac{1}{2} E_{X_{i}} [\int | f_{Y} (y) - f_{Y | X_{i}} (y) | d y]$ .

The inner $\int | f_{Y} (y) - f_{Y | X_{i}} (y) | d y$ can be interpreted as the total area between probability density function $f_{Y}$ and probability density function $f_{Y | X_{i}}$ . This is the "shift in the distribution of $Y$ provoked by $X_{i}$ " we mentioned earlier. Overall, $δ_{i}$ then says:

pick one value for $X_{i}$ and measure the shift in the output distribution from the "default" output distribution
do that for each possible $X_{i}$ and take the expectation

Some useful properties to point out:

$δ_{i}$ ranges from 0 to 1
If the output is independent of the input, $δ_{i}$ for that input is 0
The sum of $δ_{i}$ for each input considered separately isn't necessarily 1 because there can be interaction effects

In the plots below, for each charity, we visualize the delta sensitivity (and our uncertainty about that sensitivity) for each input parameter.