Effort: This took about 40 hours to research and write, excluding time spent developing Squiggle.
Disclaimer: Opinions do not represent the Quantified Uncertainty Research Institute nor GiveWell. I talk about GiveWell's intentions with the GiveDirectly model. However, my understanding may not be complete. All mistakes are my own.
Epistemic Status: I am uncertain about these uncertainties! Most of them are best guesses. I could also be wrong about the inconsistencies I've identified. A lot of these issues could easily be considered bike-shedding.
Target Audiences: I wrote this post for:
- People who are interested in evaluating interventions.
- People who are interested in the quantification of uncertainty.
- EA software developers that are interested in open source projects.
TLDR: I've transposed GiveDirectly's Cost-Effectiveness Analysis into an interactive notebook. This format allows us to measure our uncertainty about GiveDirectly's cost-effectiveness. The model finds that GiveDirectly's 95% confidence interval for its effectiveness spans an order of magnitude, which I deem a relatively low level of uncertainty. Additionally, I found an internal inconsistency with the model that increased GiveDirectly's cost-effectiveness by 11%.
The notebook is quite long, detailed and technical. Therefore, I present a summary in this post.
This model uses Squiggle, an in-development language for estimation and evaluation, developed by myself and others at the Quantified Uncertainty Research Institute. We'll write more about the language itself in future posts, especially as it becomes more stable.
GiveWell's cost-effectiveness analyses (CEAs) of top charities are often considered the gold standard. However, they still have room for improvement. One such improvement is the quantification of uncertainty. I created a Squiggle Notebook that investigates this for GiveDirectly CEA. This notebook also serves as an example of Squiggle and what's possible with future CEAs.
In GiveWell's CEAs, GiveDirectly is used as a benchmark to evaluate other interventions. All other charities' effectiveness is measured relative to GiveDirectly. For example, as of 2022, the Against Malaria Foundation was calculated to be 7.1x to 15.4x as cost-effective as GiveDirectly. Evidence Action's Deworm the World is considered 5.3x to 38.2x as cost-effective. GiveDirectly makes a good benchmark because unconditional cash transfers have a strong (some might even say tautological) case behind their effectiveness. GiveDirectly being a benchmark makes it a good start for quantifying uncertainty. I also focus on GiveDirectly because it's the most simple CEA.
- Improve people's understanding of how much evidence we have behind interventions.
- Help us judge the effectiveness of further research on an intervention using the Value of Information.
- Allows us to forecast parameters and better determine how wrong we were about different parameters to correct them over time.
Cole Haus has done similar work quantifying uncertainty on GiveWell models in Python.
The primary decision in this work is choosing how much uncertainty each parameter has. I decided on this with two different methods:
- If there was enough information about the parameter, I performed a formal bayesian update.
- If there wasn't as much information, I guessed it with the help of Nuño Sempere, a respected forecaster. These estimates are simple, and future researchers could better estimate them.
Methodology and calculations are in my Squiggle notebook:
I designed the notebook to read standalone if you are technically minded and like the specifics. However, it is long, and it may be beneficial to look over this blog post first as the notebook achieves a large number of aims at the same time.
GiveWell bases GiveDirectly's effectiveness on the premise that poorer people value money more. To do this, GiveWell models how much it costs GiveDirectly to double someone's consumption for a year. For instance, the baseline consumption is $285.92 a year. If we were to increase an individual's consumption from $285.92 to $571.84 for a year, this would be considered 1 unit of value. Consumption is the resources that they use measured in dollars. If someone consumes less, it's cheaper to double their consumption and, therefore, more cost-effective.
For the most part, this notebook is a faithful transcription of the GiveWell models into Squiggle Notebooks.
Based on our model, the mean cost to double someone's consumption for a year is $468.93, with a 95% credible interval from $130.56 to $1,185.44. Relative to other interventions, I believe this to be very low uncertainty around GiveDirectly's cost-effectiveness.
Most of this uncertainty is likely due to our estimates of parameters rather than the formal Bayesian analysis. However, I currently don't know exactly how much. I'd be interested in extending this work to answer this question in the future.
I made some changes to the model when representing it in Squiggle. These changes were either stylistic or made to remove internal inconsistencies with the model. Most of these changes were minor inconsistencies that did not change GiveDirectly's cost-effectiveness by any significant margin. I list these in the Squiggle Notebook.
However, an internal inconsistency impacted the cost-effectiveness by a more significant margin. This change was about isoelastic utility. Reading into the details requires understanding the specifics of the GiveDirectly model.
In summary, fixing the internal inconsistency increases GiveDirectly's cost-effectiveness by 11%, changing the mean cost of doubling consumption for a year from $468.61 to $415.87 with a 95% credible interval from $119.54 to $962.49.
Isoelastic vs Logarithmic increases in consumption
GiveWell measures GiveDirectly's cost-effectiveness by doublings of consumption per year. For instance, Increasing an individual's consumption from $285.92 to $571.84 for a year would be considered 1 unit of utility.
However, there is a bit of ambiguity here. What happens when you double someone's consumption twice? For instance, $285.92 to $1,143.68 a year? So, if increasing consumption by 100% has a utility of 1, how much utility does this have? A natural answer might be that doubling twice creates two units of utility. This answer is what the GiveWell CEA currently assumes. This assumption is implicit by representing utility as logarithmic increases in consumption.
However, how much people prefer the first double of consumption over the second is an empirical question. And empirically, people get more utility from the first doubling of consumption than the second. Doubling twice creates 1.66 units rather than 2. An isoelastic utility function can represent this difference in preferences.
Isoelastic utility functions allow you to specify how much one would prefer the first double to the second with the parameter. When , the recipient values the first double the same as the second and is what the current CEA assumes. When , recipients prefer the first doubling in consumption as worth more than the second. Empirically, . GiveWell recognises this and uses in their calculation of the discount rate:
Increases in consumption over time meaning marginal increases in consumption in the future are less valuable. We chose a rate of 1.7% based on an expectation that economic consumption would grow at 3% each year, and the function through which consumption translates to welfare is isoelastic with eta=1.59. (Note that this discount rate should be applied to increases in ln(consumption), rather than increases in absolute consumption; see calculations here)
I've created a desmos calculator to explore this concept. From the calculator, you can change the value of and see how isoelastic utility compares to logarithmic utility.
GiveDirectly transfers don't usually double someone's consumption for a year but increase it by a lower factor. So an isoelastic utility would find that recipients would gain more utility than logarithmic utility implies.
Changing from 1 to 1.59 increases GiveDirectly's cost-effectiveness by 11%. Reducing the cost of doubling someone's consumption for a year from $466.34 to $415.87.
I completed this work with help and funding from the Quantified Uncertainty Research Institute. Thanks to Nuño Sempere, Ozzie Gooen, David Reinstein, Quinn Dougherty, Misha Yagudin, and Edo Arad for their feedback.
Appendix A - Squiggle
Here we look into the technology that made this possible, Squiggle!
Squiggle is a language developed by the Quantified Uncertainty Research Institute for forecasting and estimating. It is in a pre-alpha stage and has many known and unknown bugs. I will detail some of the benefits of using Squiggle for this evaluation. I hope that this might encourage people to be interested in seeing more Squiggle evaluations.
This part of the analysis is simply a comparison between the two formats.
Quantification of Uncertainty
The first and most obvious change is that Squiggle allows you to represent uncertainty with distributions.
Squiggle references other variables by their names rather than by their cell references, making it easier to read different parts of an analysis individually:
Squiggle notebooks allow for text and explanations that can be linked and are easier to read than notes in Google Sheets.
Forking and change requests
Both Google Sheets and Squiggle Notebooks allow you to make copies of documents and make edits to them yourselves. However, Squiggle Notebooks supports merge requests, requesting the original document be changed, with a detailed diff of the changes made.
You can edit and interact with squiggle notebooks to change different parts of the analysis. For example, I use this to create variants of the notebook that can be enabled and disabled with checkboxes.
Reasons not to use Squiggle
Do we need another platform?
Software is not always the solution to every problem! For example, Squiggle doesn't help with a lot of the main difficulties of modelling (constructing paths to impact or estimating parameters). I do agree with this to an extent. I was able to do this work because GiveWell has already done the heavy lifting for me in constructing their model's path to impact and estimating appropriate values of parameters. However, I think that pieces of information are much more accessible now that we have Squiggle Notebooks. The most obvious is measuring uncertainty.
One may also reply that if you wanted uncertainty, why not Causal or Guesstimate? In honesty, I could complete all of the research outcomes listed here with either of these two platforms. However, I see possibilities that Squiggle can far exceed the capabilities of these two platforms, mainly through the use of functions and its flexibility as a custom DSL for estimation. This analysis did not use these features, but Ozzie discussed them in a past post on Squiggle.
Limited target audience
One of the arguments against Squiggle is that it has a limited target audience. We built the tool for people interested in calculating with uncertainty but do not want to use tools such as probabilistic programming languages (PPLs).
PPLs are tools that can do a lot more than Squiggle currently can, particularly inference. I haven't worked with many PPLs, but I have worked with Stan.
I believe Squiggle and PPLs have different use cases. I have worked to try and do the above work in Stan rather than Squiggle. The result is very far from glamorous. However, I am not proficient enough in PPLs to make that comparison.