Effort: About 80+ hours of dev work, and a little bit of writing.
This project was done under a grant from EA Funds. I would like to give thanks to Ozzie Gooen, Nuño Sempere, Chris Dirkis, Quinn Dougherty and Evelyn Fitzgerald for their comments and support on this work.
Intended Audience: This work is interesting for:
- People who do evaluate interventions, or want to get into evaluating interventions, as well as people looking for ambitious software engineering projects in the EA space.
- People looking to evaluate whether projects similar to Pedant are worth funding.
Conflict of Interest: This report is intended as a fair reporting on the work I’ve currently done with Pedant, including its advantages and disadvantages. However, I would love to be funded to work on projects similar to Pedant in the future, with a bias to the fact that it’s what I’m good at. So as much as I will try to be unbiased in my approach, I would like to declare that this report may have you see Pedant through rose coloured glasses.
Tldr: I am building a computer language called Pedant. Which is designed to write cost effectiveness calculations. It can check for missing assumptions and errors within your calculations, and is statically typed and comes with a dimensional checker within its type checker.
State of Cost Effectiveness Analysis
When we decide what intervention to choose over another, one of the gold standards is to be handed a cost effectiveness analysis to look over, showing that your dollar goes further on intervention A rather than B.
A cost effectiveness analysis also offers the opportunity to disagree with a calculation, to critique the values of parameters, and incorporate and adjust other considerations.
However, when looking at EA’s CEAs in the wild, there are many things that are lacking. I’m going to detail what I see as problems, and introduce my solution that could help improve the quality and quantity of CEAs.
The Ceiling of CEAs
When taking a look at the CEAs that founded EA, particularly GiveWell’s CEAs, as much as they are incredible and miles ahead from anything else we have, I can identify a collection of improvements that would be lovely to see.
Before going further, I need to add the disclaimer that I’m not claiming that GiveWell’s work is of low quality. What GiveWell has done is well and truly miles ahead of its time, and honestly still is, but that doesn’t mean that there are some possible improvements that I can identify. Furthermore, I may have a different philosophy than GiveWell, as I would definitely consider myself more of a Sequence Thinker rather than a Cluster Thinker.
The first and most clear need for improvement is that of formally considering uncertainty. GiveWell calculations do not consider uncertainty in their parameters, and therefore do not consider uncertainty in their final results. GiveWell’s discussion of uncertainty is often qualitative, saying that deworming is “very uncertain”, and not going much further than that.
This issue has been identified, and Cole Haus did an incredible job of quantifying the uncertainty in GiveWell CEAs. This work hasn’t yet been incorporated into GiveWell’s research.
Considering uncertainty in calculations can be done within Excel and spreadsheets, and some (particularly expensive) industrial options such as Oracle Crystal Ball and @RISK are available. Currently, Guestimate is a great option for considering uncertainties in a spreadsheet like fashion, created by our own Ozzie Gooen.
The next few issues are smaller and much more pedantic. When working with classical tools, it’s very easy to make possible errors in calculations. Particularly, looking through GiveDirectly’s Cost Effectiveness Analysis, I was able to identify two implicit assumptions about the calculation. Particularly, I found that:
- The period of time the payment was initially consumed was implicitly assumed to be 1 year
- The period of time where the recipient got some investment back was implicitly assumed to be 1 year.
These assumptions are included within the calculation without explicitly declaring that they exist. I was able to identify that these hidden assumptions existed because they are both in the unit “year”, and the calculations weren’t dimensionally valid unless I included them.
I would prefer these to be parameters, and not implicit assumptions. That way they can be questioned and the model refined.
Further, there was also a couple of what I would call errors, notably:
- Addition and subtraction of adjustment percentages. The correct way to handle this would be multiplication.
- In the first year of the transfer, the increase in welfare due to money transferred is actually slightly larger than it should be.
This is what I’ve identified so far for GiveDirectly. I’m not sure how many of these types of issues exist in the other CEAs, but after some initial investigations I think GiveDirectly might be an outlier.
Finally, the analysis is presented to you in an enormous spreadsheet. This is often difficult for readers to parse and understand it’s logic. As such, many EAs don’t ever really consult or explore it, and just accept GiveWell’s analysis as gospel.
Again, I’m far from saying it’s on GiveWell to fix this. Their work is incredible, and most of the consequential decisions needed to do good CEAs are parameter estimations, not necessarily addressing these more technical issues. However, I’m very much a Pedantic when it comes to my math and theory, so it would be nice to see if there was a way to lift the ceiling of CEAs in that area.
The Floor of CEAs
That being said, in practice, the quality of GiveWell’s work is a clear outlier. In the rest of the EA space, quantitative CEAs seem to be rarely if not even done. ACE has retracted their numerical cost effectiveness analysis since 2019, citing difficulties in modeling, and numbers that were interpreted as too certain, and have turned to qualitative cost effectiveness models.
By “The floor of CEAs”, I’m referring to initial numerical evaluations done by organizations that have not yet been evaluated, or little evaluation has been done. This however, is not difficult to find, and for all practical purposes might as well be “not GiveWell”.
In the longtermist world, I would consider Nuño’s evaluation of EA Wiki, shallow evaluations of longtermist organizations and evaluations of 2018-2019 EA Funds grantees to be the state of the art. To be clear, considering how little work this field has, the most obvious thing this field needs is more people to try, make mistakes and learn.
Talking to Nuño on creating these evaluations, he claims any “software that automates or allows for scaling any part of evaluations” would be useful in making these evaluations. He also would be appreciative of tooling that can create sophisticated probabilistic estimates and visualise them.
Thankfully, as you may have guessed from the title. I’ve attempted to do some ground work on lifting both the floor and ceiling of Cost Effectiveness Analysis.
Pedant is a language for Cost-Effectiveness Analysis, and other important calculations.
It’s good to think of Pedant as a replacement for Excel or Guesstimate, rather than a programming language. The motivation of Pedant is threefold:
- Enable the creation of higher quality cost effectiveness analysis
- Make it easier to start making cost effectiveness analysis
- Allow comparisons and reuse of cost effectiveness analysis by presenting them in a similar form.
It’s being built in three stages. The first stage is to create a basic language that can identify dimensional errors and missing assumptions. The second is to consider uncertainty, and the third to consider accessibility. Each step has been budgeted a month, with the second stage due for late January, and the third for late February
I have come to the end of the first stage of this project, so I’ll keep the discussion of the project to only this first stage. But hopefully what’s to come is just as interesting as what I have to present with this first stage.
The description and motivations of Pedant are outlined in the following sections. In type checking calculations, I talk about the dimensional analysis checker, and catching assumptions. In the abstraction section, I cover methods of abstraction available in Pedant.
Type checking calculations
One of the most important elements of any analysis is whether it’s actually correct. Whatever system I make, I would like to make it difficult to write, or at least easy to see, errors like the ones I found in GiveDirectly’s CEAs. This is in a similar spirit to strongly typed languages, and particularly functional languages, where the type system makes it difficult to write certain types of errors into them.
I identified the errors that were made in GiveDirectly’s CEAs through the use of dimensional analysis. Dimensional analysis is simply checking for the consistent use of units through the calculations, you can look through my past post for an introduction.
In the spirit of Haskell, Pedant investigates whether it would be possible to create a type system with inbuilt dimensions. Type systems allow you to write some code out, and have the problem with your code immediately described and pointed out to you, with red underline and a detailed description, where you went wrong and how to fix it. This way, the user can quickly identify flaws in their reasoning and modeling and other important considerations.
Due to the success of identifying errors/assumptions in GiveDirectly’s CEAs, I decided to create a language that does this type of checking automatically.
Identifying GiveDirectly’s implicit assumptions with Pedant
And I succeeded in doing that. Let’s take a look at some Pedant in action.
This is the cost effectiveness calculation of GiveDirectly written into Pedant. There’s a lot of interesting things to take in here, we’ll start with the basics.
The first thing to notice is that the syntax makes a lot of sense. It’s just a list of assignments, with a variable name on the left and a value on the right. However, numbers are allowed to have units to the right of them. I built a syntax plugin for neovim that highlights them green. The syntax should be very self explanatory.
Then, we have the type error. Which is telling the user that they cannot add two things that are of different dimensions, and that baseline_consumption_per_capita is of dimension , but consumption_possible_by_funds is in dimensions
This error prompts the user to think… What’s the problem here? Why am I getting this issue? The first thing the user might need to think, is whether we want both the units to be or . After a little thought, we would realise that baseline_consumption_per_capita certainly has a component to it, (aka, the consumption number is definitely “per year”) so the consumption_possible_by_funds must be in error.
You might realise that actually, consumption_possible_by_funds has no time component written into its calculation. However, the result of the calculation should depend on how quickly those funds are consumed. So Pedant has helped you identify a missing assumption.
You write this assumption in:
But then you get another error:
Here it’s complaining that the immediate increase of consumption has dimension 1 (also called dimensionless), but the rest of the expression has type years. What’s going on here?
Well, thinking about it, this pv value has to have a time component, as it is similar to the concept of a QALY. It is like quality multiplied by time, so the initial consumption must be wrong.
You can then correct up the immediate increase of consumption variable:
This example shows how Pedant can guide you into identifying possible errors within your model through dimensional checking.
The next interesting opportunity is that of abstraction. The idea that it’s possible to find similar parts of the analysis and put them into reusable components. This helps debug but also share and build on other people’s work. Pedant has both functions and modules as abstraction mechanisms.
Pedant has support for functions, functions are declared by specifying arguments after the name, similar to Haskell.
This function is a continuous version of the present value function. It is used in all of the GiveWell evaluations of effectiveness. This allows you to reference and reuse this economics calculation in all the other analysis.
Although no types are declared, this function is still statically typed. All types are inferred from their usage within the body (Haskell style). This makes for a clean style of function declaration.
Modules are simply files that you can import into other files. The syntax is very simple:
All variables and units are automatically exported.
This allows importing of both values and units from other modules, and therefore helps with the sharing and reuse of other evaluations.
The Future of Pedant
The road map of Pedant is covered in the documentation.
There are two more stages left in the initial development of Pedant, uncertainty and a web interface.
One of the most important considerations is that of uncertainty. We are really not sure about a lot of the parameters that we use in our calculations, and that uncertainty can vary between parameters and calculations. We would like to be able to represent this uncertainty within our calculations.
The next stage will take inspiration from Ozzie’s Squiggle and Guesstimate, and allow the user to write in a distribution of possible values a parameter will have, and to run Monte Carlo simulations over the model to produce an uncertainty at the end.
There are also interesting opportunities in integrating over uncertainty measures to do things such as calculate the value of information.
The final stage is to create some form of web interface. Currently, Pedant exists as a simple interpreter for a textual language, with the only interface being a command line. I am very much aware that this would exist as a major roadblock to people interested in getting into evaluation.
Further, requiring you to have a command line tool to explore an evaluation given to you by a colleague would be very tedious.
As such, I would like to create a web interface that allows the user to explore an analysis, as well as input their own assumptions into the model, and explore the different assumptions made and their sources.
Problems with the Pedant approach
The approach that Pedant takes is not without issue. Some of the issues that are listed here (and hopefully, more listed in the comments).
No focus on parameter estimation
One of the most difficult problems with creating cost-effectiveness calculations is parameters. Choosing appropriate values for parameters often feels very daunting, as you have to make large guesses at values that are often not heavily researched. These parameters are also the most heavily criticised.
Pedant offers no helping hand when guessing parameters even though it could. There are a couple of ways that this could be done:
- Gathering related data and studies into databases. So that people are aware of other estimations of similar analysis. Having a very large library of pedant CEAs or simply data points with uncertainties would be helpful here. This however, is extremely ambitious.
- Forecasting the value of parameters. For instance, in GiveWell’s GiveDirectly model, one parameter is the amount of money spent on program costs. This could easily be forecasted and evaluated on an annual basis, which allows the uncertainty to be captured accurately, as well as parameter estimates to change, as new information is available.
I am uncertain as to whether projects such as these may be a better use of time over a language such as Pedant. However, it may be valuable to have a standard portable form of cost effectiveness calculations through Pedant, that can then have systems such as these plugged into them. Pedant is an appropriate exploration in creating this standard.
In my communications with GiveWell and others about Pedant, the most prominent message I have is that the tool is too technical and complicated for someone interested in starting out.
I agree, as it stands, for the uninitiated, google sheets is much easier to get started with than Pedant. Further, when an analysis is complete, excel is probably more understandable and shareable than the code that Pedant provides.
Another thing of note is that Pedant, as a dimensional checker, is a very surprising and specific tool to bring to the table. It may be valuable to bring improvements to more general tools such as excel spreadsheets. This may be badly motivated, and was mainly motivated by its success in identifying possible errors in the GiveDirectly CEAs.
It is my hope that this barrier to entry will be lowered and it’s applicability made more general on the third stage of this project.
Why do we need another language?
Pedant is not designed to be a programming language, it’s a DSL (Domain Specific Language). Designing it as a custom language has the benefit of not being Turing complete and will execute predictably, never act maliciously, nor can it access external resources such as calling web endpoints or other interfaces. This has the benefit of making it very portable, and that you can parse and use pedant files for many other purposes, such as performing automatic sensitivity analysis over parameters.
I am yet to come across a language with similar design goals to Pedant, but would love to know of any that exist that can be used/improved upon.
That’s the current state of the work, hopefully I’ll have much more to show soon. Please feel free to ask questions or feedback. If you are interested in helping develop Pedant, please contact me on the Forum, send a pull request on GitHub, or join me at EA Public Interest Technologists to discuss it.
 For those curious why it’s larger than it should be for the first year. GiveDirectly’s CEA calculates wellbeing by the log of the increase in consumption. Each log increase of consumption corresponds to a certain amount of wellbeing.
Increase in consumption can come from three sources:
- Increase in initial consumption (Year 1)
- Increase in consumption from investment returns (Year 1-9)
- Increase in consumption from end of investment (Year 10)
It should be noted that consumption is modeled to increase from both initial consumption and investment returns in the first year, so the amount of consumption in the first year, where the baseline is the amount of money that would have been consumed without the intervention, is:
So the log increase of consumption for the first year is
However, due to the way that it’s modeled (the fact that the initial consumption is over the period of the first year seems to be ignored), this is calculated as:
Which is larger than what it should be.
 Well, the discrete version is in the CEAs, I use the continuous version because the discrete one is dimensionally confusing. I believe a continuous version also models the situation better.