Estimation for sanity checks

NunoSempere

This is a linkpost for https://nunosempere.com/blog/2023/03/10/estimation-sanity-checks/

I feel very warmly about using relatively quick estimates to carry out sanity checks, i.e., to quickly check whether something is clearly off, whether some decision is clearly overdetermined, or whether someone is just bullshitting. This is in contrast to Fermi estimates, which aim to arrive at an estimate for a quantity of interest, and which I also feel warmly about but which aren’t the subject of this post. In this post, I explain why I like quantitative sanity checks so much, and I give some examples.

Why I like this so much

I like this so much because:

It is very defensible. There are some cached arguments against more quantified estimation, but sanity checking cuts through most—if not all—of them. “Oh, well, I just think that estimation has some really nice benefits in terms of sanity checking and catching bullshit, and in particular in terms of defending against scope insensitivity. And I think we are not even at the point where we are deploying enough estimation to catch all the mistakes that would be obvious in hindsight after we did some estimation” is both something I believe and also just a really nice motte to retreat when I am tired, don’t feel like defending a more ambitious estimation agenda, or don’t want to alienate someone socially by having an argument.
It can be very cheap, a few minutes, a few Google searches. This means that you can practice quickly and build intuitions.
They are useful, as we will see below.

Some examples

Here are a few examples where I’ve found estimation to be useful for sanity-checking. I mention these because I think that the theoretical answer becomes stronger when paired with a few examples which display that dynamic in real life.

Photo Patch Foundation

The Photo Patch Foundation is an organization which has received a small amount of funding from Open Philanthropy:

Photo Patch has a website and an app that allows kids with incarcerated parents to send letters and pictures to their parents in prison for free. This diminishes barriers, helps families remain in touch, and reduces the number of children who have not communicated with their parents in weeks, months, or sometimes years.

It takes little digging to figure out that their costs are $2.5/photo. If we take the AMF numbers at all seriously, it seems very likely that this is not a good deal. For example, for $2.5 you can deworm several kids in developing countries, or buy a bit more than one malaria net. Or, less intuitively, trading 0.05% chance of saving a statistical life for sending a photo to a prisoner seems like a pretty bad trade–0.05% of a statistical life corresponds to 0.05/100 × 70 years × 365 = 12 statistical days.

One can then do somewhat more elaborate estimations about criminal justice reform.

Sanity-checking that supply chain accountability has enough scale

At some point in the past, I looked into supply chain accountability, a cause area related to improving how multinational corporations treat labor. One quick sanity check is, well, how many people does this affect? You can check, and per here¹, Inditex—a retailer which owns brands like Zara, Pull&Bear, Massimo Dutti, etc.—employed 3M people in its supply chain, as of 2021.

So scalability is large enough that this may warrant further analysis. One this simple sanity check is passed, one can then go on and do some more complex estimation about how cost-effective improving supply chain accountability is, like here.

Sanity checking the cost-effectiveness of the EA Wiki

In my analysis of the EA Wiki, I calculated how much the person behind the EA Wiki was being paid per word, and found that it was in the ballpark of other industries. If it had been egregiously low, my analysis could have been shorter, and maybe concluded that this was a really good bargain. If the amount had been egregiously high, maybe I would have had to dig in about why that was.

As it was, the sanity check was passed, and I went on to look at other considerations.

Optimistic estimation for early causes

Occasionally, I’ve seen some optimistic cost-effectiveness estimates by advocates of a particular cause area or approach (e.g., here, here, or here). One possible concern here is that because it’s the advocates that are doing this cost-effective estimates, they might be biased upwards. But even if they are biased upwards, they are not completely uninformative: they show that at least some assumptions and parameters, chosen by someone who is trying their best, under which the proposed intervention looks great. And then further research might reveal that the initial optimism is or isn’t warranted. But that first hurdle isn’t trivial.

Other examples

You can see the revival of LessWrong pretty clearly if you look at the number of votes per year. Evaluating the value of that revival is much harder, but one first sanity check is to see whether there was some reviving being done.
When evaluating small purchases, sometimes the cost of the item is much lower than the cost of thinking about it, or the cost of the time one would spend using the item (e.g., for me, the cost of a hot chocolate is smaller than the cost of sitting down to enjoy a hot chocolate). I usually take this as a strong sign that the price shouldn’t be the main consideration for those types of purchase, and that I should remember that I am no longer a poor student.
Some causes, like rare diseases, are not going to pass a cost-effectiveness sanity check, because they affect too few people.
If you spend a lot of time in front of a computer, or having calls, the cost of better computer equipment and a better microphone is most likely worth it. I wish I’d internalized this sooner.
Raffles and lotteries (e.g., “make three forecasts and enter a lottery to win $300”, or “answer this survey to enter a raffle to win $500”) are usually not worth it, because they don’t reveal the number of people who enter, and it’s usually fairly high.
etc.

Conclusion

I explained why I like estimates as sanity checks: they are useful, cheap, and very defensible. I then gave several examples of dead-simple sanity checks, and in each case pointed to more elaborate follow-up estimates.

64 Reactions

Use of “I’d bet” on the EA Forum is mostly metaphorical

12 comments18 karma

Some estimation work in the horizon

No comments25 karma

Mentioned in

60Making better estimates with scarce information

25Cost-effectiveness of The Introductory EA Program and 80,000 Hours’ 1-on-1 program

Comments7

Sorted by

New & upvoted

Click to highlight new comments since: Today at 1:47 PM

Vasco Grilo🔸Mar 24 202313

Great post, Nuño!

Your 1st example about the Photo Patch Foundation reminded me of SoGive's shallow analyses, whose methodology here. I encourage people interested in practicing estimation to check them out.

To illustrate, here are the summaries of the 1st 3 I did during my SoGive volunteering back in 2021 (which were actually my 1st 3 EA-type analyses!):

Analysis of Royal Opera House:

The Royal Opera House Covent Garden Foundation is rated as Not Recommended (Firm) on the SoGive ratings scale.
The Royal Opera House Covent Garden Foundation (ROHF) is the charity running the Royal Opera House (ROH), which is an opera house and major performing arts venue in Covent Garden, central London.
Based on its 2018/2019 annual report, we believe the expenditure by the charity is on average about £17 per visit to the ROHF spaces or attendance of ROH performances or cinema screenings.
For this cost, a Gold-rated mental health organisation would be expected to avert about 1 month of severe depression. While there may be positive impacts on well-being from observing a performance, and less tangible cultural benefits, we do not believe such impacts are as good as averting 1 month of severe depression.
For this reason, it is very likely that a donation to the ROHF will achieve less positive impact than the same amount of money given to a Gold-rated organisation.

Analysis of The Church Of Jesus Christ Of Latter Day Saints:

The Church Of Jesus Christ Of Latter-Day Saints (Great Britain) is rated as Not Recommended (Firm) on the SoGive ratings scale.
The Church Of Jesus Christ Of Latter-Day Saints (Great Britain) (LDSGB) has the objective of promoting and furthering the religious and other charitable work of its parent organisation, The Church of Jesus Christ of Latter-Day Saints (Church), especially in the United Kingdom.
Based on its 2019 annual report, we believe the expenditure per member of the Church in the UK during 2019 was about £820.
For this cost, a Gold-rated organisation would be expected to avert about 4 years of severe depression. While there may be positive impacts on well-being from being a member of the Church, we do not believe these are as good as averting 4 years of severe depression.
For this reason, it is very likely that a donation to LDSGB will achieve less positive impact than the same amount of money given to a Gold-rated organisation.

Analysis of The British Museum Trust:

The British Museum Trust Limited is rated as Not Recommended (Firm) on the SoGive ratings scale.
The overall objective of The British Museum Trust Limited (BMT), to be achieved through the award of grants, is "to advance, in a manner in which the Trustees of the charity see fit, the charitable objects of the Trustees of the British Museum (as may be amended from time to time); and to advance culture, heritage, science, education and the arts for public benefit throughout the world in any manner incidental, conducive to or compatible with the charitable objects of the Trustees of the British Museum" (2019/2020 annual report, p. 4).
Based on its 2018/2019 and 2019/2020 annual reports, we believe the average cost to the charity per visit to the British Museum is at least £0.34.
For this cost, a Gold-rated organisation could be expected to avert about 15 hours of severe depression. While there may be positive impacts on well-being from visiting the British Museum, and less tangible cultural benefits, we do not believe these are as good as averting 15 hours of severe depression.
For this reason, it is very likely that a donation to BMT will achieve less positive impact than the same amount of money given to a Gold-rated organisation.

NunoSempereMar 25 20234

Thanks Vasco, these are great. Though, where are you getting the depression baseline from?

Vasco Grilo🔸Mar 25 20234

Ah, sorry, I should have clarified. I used SoGive's Gold Standard Benchmark of 200 £ per year of severe depression averted. This was obtained surveying a sample of 500 nationally representative UK residents, 13 EAs, and SoGive's team (see details in the link). I suppose Ishaan will come up with a better estimate in the process of this.

NickLaingMar 23 20239

Amazing I think this is a great (if fairly intuitive) concept, and I feel like this post might deserve more attention.

I think I do this quite a lot, but I haven't seen this crystallised so well before. I think we should all be sanity checking all the time.

I did have to sanity check one of your sanity checks though. Some "Neglected diseases" (as defined by the WHO) actually affect lots of people. E.g. Shistosomiasis infects something like 340 million people and might cause something like 2 million DALYs a year, which is hardly chicken feed ;)

Also am honoured (sort of) that you included my analysis of OneDay Health in your examples haha

NunoSempereMar 23 20232

Thanks Nick.

"Neglected diseases" (as defined by the WHO)

Yeah, I was thinking more like rare genetic diseases. Edited to say rare rather than neglected

sjeffhMar 21 20233

I agree with the overall point that sanity-checking with estimation is a good idea, but I don't find the Photo Patch Foundation example very compelling for this point. $2.50/photo or 12 quality-adjusted days of life per photo seems acceptable to me given the long-term productivity benefits of improving the morale of both the incarcerated parent and the kid.

NunoSempereMar 22 20235

Yeah, I agree they seem acceptable/good on an absolute level (though as mentioned I think that much better interventions exist).