[Stats4EA] Uncertain Probabilities

Nice post! Here's an illustrative example in which the distribution of matters for expected utility.

Say you and your friend are deciding whether to meet up but there's a risk that you have a nasty, transmissible disease. For each of you, there's the same probability that you have the disease. Assume that whether you have the disease is independent of whether your friend has it. You're not sure if has a beta(0.1,0.1) distribution or a beta(20,20) distribution, but you know that the expected value of is 0.5.

If you meet up, you get +1 utility. If you meet up and one of you has the disease, you'll transmit it to the other person, and you get -3 utility. (If you both have the disease, then there's no counterfactual transmission, so meeting up is just worth +1.) If you don't meet up, you get 0 utility.

It makes a difference which distribution has. Here's an intuitive explanation. In the first case, it's really unlikely that one of you has it but not the other. Most likely, either (i) you both have it, so meeting up will do no additional harm or (ii) neither of you has it, so meeting up is harmless. In the second case, it's relatively likely that one of you has the disease but not the other, so you're more likely to end up with the bad outcome.

If you crunch the numbers, you can see that it's worth meeting up in the first case, but not in the second. For this to be true, we have to assume conditional independence: that you and your friend having the disease are independent events, conditional on the probability of an arbitrary person having the disease being . It doesn't work if we assume unconditional independence but I think conditional independence makes more sense.

The calculation is a bit long-winded to write up here, but I'm happy to if anyone is interested in seeing/checking it. The gist is to write the probability of a state obtaining as the integral wrt of the probability of that state obtaining, conditional on , multiplied by the pdf of (i.e. ). Separate the states via conditional independence (i.e. ) and plug in values (e.g. P(you have it|p)=p) and integrate. Here's the calculation of the probability you both have it, assuming the beta(0.1,0.1) distribution. Then calculate the expected utility of meeting up as normal, with the utilities above and the probabilities calculated in this way. If I haven't messed up, you should find that the expected utility is positive in the beta(0.1,0.1) case (i.e. better to meet up) and negative in the beta(20,20) case (i.e. better not to meet up).

How good is The Humane League compared to the Against Malaria Foundation?

Thanks, this is a good criticism. I think I agree with the main thrust of your comment but in a bit of a roundabout way.

I agree that focusing on expected value is important and that ideally we should communicate how arguments and results affect expected values. I think it's helpful to distinguish between (1) expected value estimates that our models output and (2) the overall expected value of an action/intervention, which is informed by our models and arguments etc. The guesstimate model is so speculative that it doesn't actually do that much work in my overall expected value, so I don't want to overemphasise it. Perhaps we under-emphasised it though.

The non-probabilistic model is also speculative of course, but I think this offers stronger evidence about the relative cost-effectiveness than the output of the guesstimate model. It doesn't offer a precise number in the same way that the guesstimate model does but the guesstimate model only does that by making arbitrary distributional assumptions, so I don't think it adds much information. I think that the non-probabilistic model offers evidence of greater cost-effectiveness of THL relative to AMF (given hedonism, anti-speciesism) because THL tends to come out better and sometimes comes out much, much better. I also think this isn't super strong evidence but that you're right that our summary is overly agnostic, in light of this.

In case it's helpful, here's a possible explanation for why we communicated the findings in this way. We actually came into this project expecting THL to be much more cost-effective, given a wide range of assumptions about the parameters of our model (and assuming hedonism, anti-speciesism) and we were surprised to see that AMF could plausibly be more cost-effective. So for me, this project gave an update slightly in favour of AMF in terms of expected cost-effectiveness (though I was probably previously overconfident in THL). For many priors, this project should update the other way and for even more priors, this project should leave you expecting THL to be more cost-effective. I expect we were a bit torn in communicating how we updated and what the project showed and didn't have the time to think this through and write this down explicitly, given other projects competing for our time and energy. It's been helpful to clarify a few things through this discussion though :)

How good is The Humane League compared to the Against Malaria Foundation?

Thanks for raising this. It's a fair question but I think I disagree that the numbers you quote should be in the top level summary.

I'm wary of overemphasising precise numbers. We're really uncertain about many parts of this question and we arrived at these numbers by making many strong assumptions, so these numbers don't represent our all-things-considered-view and it might be misleading to state them without a lot of context. In particular, the numbers you quote came from the Guesstimate model, which isn't where the bulk of the work on this project was focused (though we could have acknowledged that more). To my mind, the upshot of this investigation is better described by this bullet in the summary than by the numbers you quote:

**In this model, in most of the most plausible scenarios, THL appears better than AMF.**The difference in cost-effectiveness is usually within 1 or 2 orders of magnitude. Under some sets of reasonable assumptions, AMF looks better than THL.**Because we have so much uncertainty, one could reasonably believe that AMF is more cost-effective than THL or one could reasonably believe that THL is more cost-effective than AMF.**

How good is The Humane League compared to the Against Malaria Foundation?

Thanks for this. I think this stems from the same issue as your nitpick about AMF bringing about outcomes as good as saving lives of children under 5. The Founders Pledge Animal Welfare Report estimates that THL historically brought about outcomes as good as moving 10 hen-years from battery cages to aviaries per dollar, so we took this as our starting point and that's why this is framed in terms of moving hens from battery cages to aviaries. We should have been clearer about this though, to avoid suggesting that the only outcomes of THL are shifts from battery cages to aviaries.

How good is The Humane League compared to the Against Malaria Foundation?

Thanks for this comment, you raise a number of important points. I agree with everything you've written about QALYs and DALYs. We decided to frame this in terms of DALYs for simplicity and familiarity. This was probably just a bit confusing though, especially as we wanted to consider values of well-being (much) less than 0 and, in principle, greater than 1. So maybe a generic unit of hedonistic well-being would have been better. I think you're right that this doesn't matter a huge amount because we're uncertain over many orders of magnitude for other variables, such as the moral weight of chickens.

The trade-off problem is really tricky. I share your scepticism about people's actual preferences tracking hedonistic value. We just took it for granted that there is a single, privileged way to make such trade-offs but I agree that it's far from obvious that this is true. I had in mind something like "a given experience has well-being -1 if an idealised agent/an agent with the experiencer's idealised preferences would be indifferent between non-existence and a life consisting of that experience as well as an experience of well-being 1". There are a number of problems with this conception, including the issue that there might not be a single idealised set of preferences for these trade-offs, as you suggest. I think we needed to make some kind of assumption like this to get this project off the ground but I'd be really interested to hear thoughts/see future discussion on this topic!

Founders Pledge Charity Recommendation: Action for Happiness

Yes, feeling much better now fortunately! Thanks for these thoughts and studies, Derek.

Given our time constraints, we did make some judgements relatively quickly but in a way that seemed reasonable for the purposes of deciding whether to recommend AfH. So this can certainly be improved and I expect your suggestions to be helpful in doing so. This conversation has also made me think it would be good to explore six monthly/quarterly/monthly retention rates rather than annual ones - thanks for that. :)

Our retention rates for StrongMinds were also based partly on this study, but I wasn't involved in that analysis so I'm not sure on the details of the retention rates there.

Founders Pledge Charity Recommendation: Action for Happiness

Yes, we had physical health problems in mind here. I appreciate this isn't clear though - thanks for pointing out. Indeed, we are aware of the underestimation of the badness of mental health problems and aim to take this into account in future research in the subjective well-being space.

Founders Pledge Charity Recommendation: Action for Happiness

Thanks very much for this thoughtful comment and for taking the time to read and provide feedback on the report. Sorry about the delay in replying - I was ill for most of last week.

1. Yes, you're absolutely right. The current bounds are very wide and they represent extreme, unlikely scenarios. We're keen to develop probabilistic models in future cost-effectiveness analyses to produce e.g. 90% confidence intervals and carry out sensitivity analyses, probably using Guesstimate or R. We didn't have time to do so for this project but this is high on our list of methodological improvements.

2. Estimating the retention rates is challenging so it's helpful for us to know that you think our values are too high. We based this primarily on our retention rate for StrongMinds, but adjusted downwards. It's possible we anchored on this too much. However, it's not clear to me that our values are too high. In particular, if our best-guess retention rate for AfH is too high, then this is probably also true for StrongMinds. Since we're using StrongMinds as a benchmark, this might not change our conclusions very much.

The total benefits are calculated somewhat confusingly and I appreciate you haven't had the chance to look at the CEA in detail. If is the effect directly post-treatment and is the retention rate, we calculated the total benefits as

That is, we assume half a year of full effect, and then discount each year that follows by each time. We calculated it in this way because for StrongMinds, we had 6 month follow-up data. However, it's not clear that this approach is best in this case. It might have been better to:

- Assume 0.15 years at full effect
- Since the study has only an 8 week follow-up, as you mention
- Assume somewhere in between 0.15 and 0.5 years at full effect
- Since the effects still looked very good at 8 week follow-up (albeit with no control) and evidence from interventions such as StrongMinds that suggest longer-lasting effects still seems somewhat relevant

Finally, I think there are good reasons to prefer AfH over CBT in high-income countries, even if our CEA suggests they are similarly cost-effectiveness in terms of depression. (Though they might not be strong enough to convince you that AfH and e.g. StrongMinds are similarly cost-effective.)

- AfH aims to improve well-being broadly, not just by treating mental health problems.
- Although much -- perhaps most -- of the benefits of AfH's courses come from reduction in depression, some of the benefits to e.g. happiness, life satisfaction and pro-social behaviour aren't captured by measuring depression
- Our CEA is very conservative in some respects
- The effect sizes we used (after our Bayesian analysis) are about 30% as large as reported in the study
- If CBT effects aren't held to similar levels of scrutiny, then we can't compare cost-effectiveness fairly
- We think that the wider benefits of AfH's scale-up could be very large
- We focused just on the scale-up of the Exploring What Matters courses because this is easiest to measure
- The happiness movement that AfH is leading and growing could be very beneficial, e.g. widely sharing materials on AfH's website, bringing (relatively small) benefits to a large number of people

That said, I think it's worth reconsidering our retention rates when we review this funding opportunity. Thanks for your input.

3. This is correct. We did not account for the opportunity cost of facilitators' or participants' time. As always, there are many factors and given time constraints, we couldn't account for all of them. We thought that these costs would be small compared to the benefits of the course so we didn't prioritise their inclusion. I don't think we explicitly mentioned the opportunity cost of time in the report though, so thanks for pointing this out.

Does anyone have any recommendations for landmine charities, or know of impact assessments?

Here's another option that detects landmines with rats: https://www.apopo.org/en

Can't comment on cost-effectiveness compared to other similar organisations but it won a Skoll Award for Social Entrepreneurship in 2009 http://skoll.org/organization/apopo/ http://skoll.org/about/skoll-awards/ https://en.m.wikipedia.org/wiki/Skoll_Foundation#The_Skoll_Awards_for_Social_Entrepreneurship

Reflecting on this example and your x-risk questions, this highlights the fact that in the beta(0.1,0.1) case, we're either very likely fine or really screwed, whereas in the beta(20,20) case, it's similar to a fair coin toss. So it feels easier to me to get motivated to work on mitigating the second one. I don't think that says much about which is higher priority to work on though because reducing the risk in the first case could be super valuable. The value of information narrowing uncertainty in the first case seems much higher though.