By Sindy Li.
Cross-posted from the Oxford Prioritisation Project blog. We're centralising all discussion on the Effective Altruism forum. To discuss this post, please comment here.
Summary: We built a quantitative model estimating the impact of 80,000 Hours. To measure our uncertainty, we built the model using Monte Carlo simulations in Guesstimate. This post acts as an appendix to that quantitative model.
1. Model mechanics
We would like to estimate the marginal impact of donating 10,000 GBP to 80,000 Hours (henceforth “80K”) now.
We approximate it by the average impact per dollar of 80K for the year 2016. We will discuss limitations of this approach later.
For a meta-charity like Giving What We Can (henceforth “GWWC”) which persuades people to donate to their recommended charities, an important metric is the “multiplier”, namely for each dollar of their operation cost (i.e. of donation to GWWC), how many dollars of donations to their recommended charities are generated.
For 80K, it is more complicated because they aim to cause plan changes, some being career changes and some being changes in donation plans. The main idea of our model is to convert all plan changes to equivalent amounts of donations to charities in order to get the “multiplier”, and then use the cost-effectiveness of charities receiving donation to calculate the cost-effectiveness of the plan changes.
Here are the steps.
Step 1: Measuring (and reweighing) 80K’s output
Their outputs are impact-adjusted plan changes. For each plan change that results in their work (which they collect through ways described here), they assign an impact score of 0.1, 1, or 10, depending on the magnitude of the impact. For some examples, see here.
To measure their output, need to find out the number of plan changes assigned to 0.1, 1 and 10 over 2016. From here we see that they had 1414 raw plan changes (i.e. each counting as 1), and 910.9 impact-adjusted plan changes (i.e. each assigned a score). We don’t know the breakdown of the 1414 by score, so we eyeballed from the third graph in this section, and got:
● 0.1: 700
● 1: 650
● 10: 14
They add up to 1364, and are equivalent to 860 impact-adjusted plan changes, so we are underestimating their output a bit.
Then, we apply an adjustment. Since people self report on plan changes, there could be social desirability bias and overstating. We think that this may be a bigger problem for smaller plan changes, and less so for major ones (scored 10). So we reweigh 0.1’s and 1’s each by 40%, and do not adjust for 10’s.
This gives us 40%*0.1*700 + 40%*1*650 + 10*14 = 428 reweighed impact-adjusted plan changes.
Step 2: Converting output (IAPCs) to equivalent $s in donation
They say (here): “A typical plan change scored 1 is someone who has taken the Giving What We Can pledge or decided to earn to give in a medium income career.” Assuming that the scores of 0.1, 1 and 10 correctly captures the magnitude of impact of these career changes, we can use a GWWC pledge as the benchmark for measuring the impact of any plan change.
What is the value of a GWWC pledge in terms of donations to GWWC recommended charities? They say (here, the third point under “Impact and cost-effectiveness”): “In 2016, we caused 115 people to take the Giving What We Can (GWWC) 10% pledge. GWWC estimates this is worth about £5 million in donations to their recommended charities (counterfactually-adjusted, time-discounted, dropout adjusted).” So each pledge is worth on average 5,000,000/115 = 43,478 GBP of donation to GWWC’s recommended charities. (I’m not sure where the GWWC estimate comes from, and whether it applies to GWWC pledges in general or only the ones caused by 80K.)
428 reweighed IAPCs * (43,478 GBP per IAPC) = 18m GBP in total donations (equivalent)
Note that we are assuming their are assigning impact correctly, namely
● Each plan change scored 1 is as effective as a GWWC pledge, i.e. as effective as 43,478 GBP of donation to GWWC’s recommended charities
● A plan change scored 0.1 is 1/10 as effective as a plan change scored 1
● A plan change scored 10 is 40 (rather than 10, a result of the adjustment) times as effective as a plan change scored 1
Step 3: Multiplier
Now, we get the total amount of (equivalent) donation 80K generated in 2016. We just need to divide by their operation cost in 2016 to get the multiplier. Their operation cost in 2016, including opportunity cost (of their staff not doing earning to give), is 500,000 GBP (financial cost only is 250,000 GBP). This is the number we use.
Multiplier = total donations (equivalent) generated / total operation cost = (18m)/(500k) = 36, for 2016
Step 4: Cost-effectiveness: donations to DALY/$
For the last step, to get the cost-effectiveness of $1 in donation to 80K (i.e. $1 of their operation cost), we just need to multiplier the “multiplier” by the cost-effectiveness of $1 in donation to GWWC recommended charities (again, under the assumptions of conversion of cost-effectiveness between plan changes).
For the cost-effectiveness of $1 in donation to GWWC recommended charities, we use AMF, or more precisely, GiveWell’s median value of AMF. It is about 0.011 DALY/$ (we convert a life saved equivalent to 35 DALYs, and use the median value of AMF, $3162, from here). Note that AMF does not have the highest cost-effectiveness (in median values) among GiveWell’s top charities.
So for each dollar in donation to 80K in 2016, there are 38*0.011 = 0.418 DALYs averted.
2. Model limitations
1) Marginal vs. Average
We wanted the marginal impact of donating 10,000 GBP to 80,000 Hours (henceforth “80K”) now. We approximate it by the average impact per dollar of 80K for the year 2016.
Why 2016, rather than e.g. the historical average? Their financial costs per impact-adjusted plan change has been going down over the years (see last row of first table here), so 2016 average cost-effectiveness will probably be closer to 2017 numbers.
To get 2017 average cost-effectiveness, we need predictions about 2017, e.g. their operation cost and number of plan changes they will generate.
From there, to get the marginal cost-effectiveness of donation now, we need to also know what marginal donation (or 10,000 GBP) will do, which depend on 1) their current financial situation, 2) their plan for marginal donation.
In March 2017, they announced that they have reached their 2017 fundraising target (counting donations that have been promised but not yet received for 2017). What do they plan to do with addition funding? They say (on the same page) that “Based on this, I’d say we’re not heavily funding constrained; although we could still use additional money to do things like try to attract more experienced staff with higher salaries and a better office, or take more risk by trying to grow faster.” Previously, they said (see here): "Moreover, even if we made this target, it wouldn’t exhaust our room for more funding. If we raised more, we could increase our reserves, which would make it easier to attract staff, or we could pursue our expansion opportunities more aggressively (e.g. hire more, larger marketing budget)."
How should we think about all these things they could do with additional funding now?
● Attracting more experienced staff with higher salary and nicer office: more experienced staff are more productive which would increase the average cost-effectiveness above the current level, so the marginal must be greater than the current average. (But this does not happen for sure: donating 10,000 GBP will merely increase the probability that happens by something less than 1, since to fully fund new staff they need more additional donation which is not so likely to come from other donors given they have reached their target. So we’d then adjust by some less-than-1 probability.)
● Increasing reserve: in addition to attracting more experienced staff, this frees up staff time from fundraising (for future years), so the cost-effectiveness would probably be less than 100% additional cost-effectiveness (since presumably staff first spend time on the most productive things).
● Expanding marketing budget: we are not sure about its cost effectiveness. (Someone thinks additional funding is unlikely to go to marketing since they are doing an experiment with marketing now and there is not much value in expanding its size -- see point number 5 here, and here. But they also have less experiential marketing approaches that could be scaled up.)
Overall, it’s possible that marginal cost-effectiveness (of additional donation now, after they have reached 2017 fundraising target) will be lower than the average for 2017 without additional donation, but it may not be by much and we do not have a principled way of adjusting for it. So we end up not doing any adjustment.
Conceptually, some have argued that for meta-charities, marginal cost-effectiveness could be much lower than average cost-effectiveness (e.g. see here: “Meta Trap #6. Marginal impact may be much lower than average impact”). I find it helpful to think through the specifics like we did above. (In general, it seems helpful if charities can share their budget as well as current funding situations in a transparent way similar to 80K, to improve coordination among small, low information donors. An alternative is to do what GiveWell top charities do: share that information with GiveWell which updates recommended donation allocations and does regranting. Another alternative is to donate through a donor lottery or the EA fund where a single party collects information from the charity.)
On the contrary, some have argued for increasing returns in small organizations. I do not think we have a case here of increasing returns in money (or staff time). Whether returns are increasing or decreasing in additional funding depends on how the funding is received. Expecting a large chunk of funding (either in the form of receiving such amounts at once, or even expecting a total large amount received in small chunks if there is no lumpy investment or borrowing constraint) could enable an organization to do more risk taking, while getting unanticipated small amounts of funding at a time -- even if the total adds up to more -- will probably just lead the organization to use the marginal dollar to “fund the activity with the lowest (estimated) cost-effectiveness”. 80K’s stated plan with marginal funding at this stage seems consistent with the latter, since marginal funding probably won’t be in large chunks especially given that they aren’t funding constrained now. The scenario Ben Todd has in mind probably applies more when a large funder is considering how much to give to an organization. This may be another argument to enter donor lottery or donate through the EA fund: giving a large and certain amount of donations to a small organization enables them to plan ahead for more risky but growth enhancing strategies, hence could be more valuable than uncoordinated small amounts even if the latter add up to the same total (because the latter may be less certain).
2) Outcome measure
We relied heavily on the assumption that 80K measures impact correctly in their scoring system (except for applying our own reweighing). Is this a source of concern? I have not spot checked the raw data on plan changes and score assignment (which ideally I should as a donor), but someone else, external to 80K and experienced in the EA community who was facing a donation decision to them, did it and told me that they thought the measure was good.
Another related issue is whether donations (e.g. GWWC pledges that are all assigned 1) have heterogeneous impacts. We assume that they are all on par with AMF. GWWC’s top charities largely coincide with GiveWell’s list, and among the latter AMF is in the middle in terms of cost-effectiveness. However, it is possible that some who took the pledge (or do earning to give) give to other charities whose cost-effectiveness is either lower than or hard to compare with these charities.
One more issue with the plan change scores, specifically related to certain types of changes, is that sometimes people change their plan to work on far future interventions, e.g. AI safety. Such interventions are higher return but also higher variance than conventional global health interventions (e.g. see our MIRI model). In our model we convert everything to donations to global health charities, which results in relatively lower returns and lower variance. If we adjust it to reflect the fact that a fraction of the plan changes are in high return, high variance areas, what would happen? (In our aggregation model we use Bayesian updating, so both return and variance are important.) We wish we had done this but we haven’t. Here is a simple argument why 80K probably will still dominate our contender in the far future area, MIRI, even if we acknowledge that a part of 80K is now high return and high variance like MIRI: 80K in its current version (i.e. exclusively modeled with donations to global health) ended up winning in our aggregation (having a better posterior than MIRI); converting a fraction of it to be like a far future intervention will result in something like a mini MIRI that is more cost-effective than the actual MIRI (since it’s cheaper to persuade people to go into AI safety, which is how 80K causes such changes, than employing them at an organization like MIRI), so it dominates MIRI, and the remaining part (global health) also does as we saw, so the combination should still dominate MIRI. This is a hand waving, ex post argument that is not ideal. We wish we had done the actual comparison of the 2 versions of 80K models in our aggregation. (This question was raised at our presentation. In building our model, we were thinking in a very simplified way that neglected this concern.)
3) “Growth approach”
Some proposed to use the “growth approach” to evaluate a young non profit rather than calculating the marginal impact. Ideally, the considerations outlined in the article should be incorporated in the cost-effectiveness analysis, just like investments should be evaluated with net present value (expected, discounted stream of future profit) rather than profit in the current period. But in practice cost-effectiveness analysis of charities often neglect such considerations, not to mention these things are hard to incorporate in a quantitative model.
In addition to looking at potential to expand the market mentioned in the article, some other possibilities include: giving money to ensure organizational survival and growth so that they can 1) learn from doing and improve itself, 2) discover new opportunities that currently no one (including funder or organization itself) has thought about -- e.g. neither GiveWell itself or its funders in the early days may have expected them to spin off the Open Philanthropy Project, and the same goes for Animal Charity Evaluators which grew out of 80,000 Hours. Such “unknown unknowns” are hard to address directly, and fostering such opportunities requires identifying young organizations with good people that have the potential to learn and grow.
Given the importance of such considerations and the difficulty of modelling them quantitatively, to holistically evaluate an organization, especially a young one, there is an argument for using a qualitative approach and “cluster thinking”, in addition to a quantitative approach and “sequential thinking.”
4) “Meta trap”
Even if we end up deciding that we should donate the 10,000 GBP to 80K because it will have the highest impact there, does that mean it is the best place for all EA donors (or at least low information donors)? Not necessarily, due to the static and unilateral nature of our model. I discuss some scenarios where our model can be taken too far and hence be no longer valid.
Suppose all other EA donors take literally our conclusion that 80K is the most cost-effective place to donate this year. Even assuming 80K doesn’t run into diminishing returns (the standard “room for more funding” concern), in that they can still reach the same number of people for each additional dollar of donation, we will run into problems.
First, suppose all other EAs donate only to 80K this year. Then other EA top charities including many object-level ones may be much more short on funding. This is bad not only because at that point marginal returns of donating to them (only in terms of the impact they have in carrying out object-level goals, e.g. distributing bednets) could exceed that of donating to 80K, but also because this may significantly weaken the chance of survival or growth for these organizations. These organizations not only carry out object-level work that improves lives, but also contribute to learning and capacity building (see in Howie’s comment here; also related to trap #5 here), something not captured in our model.
Now, we might still be okay if the new “EAs” generated by 80K still donate to recommended object-level charities so they still have enough funding (in fact maybe even way more than the counterfactual, since now all pre-existing EA donors donate to 80K and we assumed constant multiplier with scale -- a perhaps unrealistic assumption made just for the sake of the argument).
But imagine what happens when “new EAs” become “seasoned EAs” next year, start to reason in the EA style (rather than simply following GWWC’s recommendation) and donate to where money has the highest impact. Suppose they also take our recommendation literally, and into next year (an additional assumption), and all donate to 80K instead of object-level charities. Then a part of our model breaks down: the value of a GWWC pledge is going to be smaller than what we used, since over the lifetime of a “new EA” generated by 80K they are going to switch from donating to object-level charities to donating to 80K, and if this happens to every generation of “new EAs” generated by 80K, then in the end very little of their lifetime donation will go to object-level charities, and most will go to 80K which ends up generating little object-level donation.
Of course this is an extreme scenario: not only we assume that people agree on the same highest-impact charity and make all their donations there, but also that every new generation keeps using our static model. (A more immediate, but similarly extreme version, of the scenario is when “new EAs” immediately become “seasoned EAs” and realize that instead of following GWWC’s current recommended object-level charities they should donate right now to 80K, in which case 80K’s chain of impact break down immediately since it causes no object-level work to be done.) But we could imagine even in a weaker version of this, when our static recommendation for a unilateral donor is generalized inappropriately for all EA donors in a dynamic setting but in a less extreme fashion, 80K’s cost-effectiveness will still be undermined. (This is related to traps 2 and 4 discussed here.)
Now imagine a different scenario, where we still assume that all current EAs blindly follow our recommendation and make all donations to 80K. People approached by 80K who could potentially become “new EAs” look at what current EAs do and think: it seems dubious that these people are just donating to build their movement rather than doing object-level work. The multiplier may still apply if current EAs manage to get “new EAs” to contribute to object-level work at the same rate, but they are solely relying on new recruits to contribute to object-level work. They may have doubts about joining the movement, and may think existing EAs are mistaken or even brainwashed by certain self-serving “movement leaders” who care only about growing the “movement”. This could appear to be the case to outsiders even if existing EAs (including any movement leader) were truly trying to maximize impact. (This is another consequence of trap 2 discussed here: that it may hurt the growth of the movement by making it look “dubious”.)
Note that in the above scenarios I am treating 80K more like GWWC that gets people to donate to charities, whereas in fact 80K focuses on plan changes of which donation is only a part. Hence these may not be the most appropriate examples, and I was only using them to illustrate some issues with taking a static model of a meta-charity that is aimed at a unilateral donor and generating it to the entire community in a dynamic setting. This is not a shortcoming of our model per se, but for all models of this static and unilateral nature: due to these limitations they should not be generalized beyond their appropriate scope.
In general, a lesson (from both the 80K model and our other quantitative models) is that many important considerations are very difficult (and perhaps impossible) to incorporate in a quantitative model. To really make the best judgement on whether to donate to a charity, qualitative arguments and “cluster thinking” may be valuable in addition to quantitative models.
 Some questions on this: what is the cost-effectiveness of different ways of reaching out to people (online materials, in person coaching etc.)? Online content is causing more of the impact-adjusted plan changes (see here), and is probably cheapest per plan change. See here for what they think are more/less useful.
 Note that GiveWell seems to take these into account in their charity recommendations, as quantitative cost-effectiveness analysis is only one element, and they often mention potential for learning and growth when talking about room for more funding.
 Some of the reasons here also point to the conclusion that even it is found that some cause to be the most cost-effective at the moment (and even if that is not meta), the EA movement should not invest all resources in any given year in one place, due to the learning value from developing other causes and the possibility that cost-effectiveness of different interventions change over time. This is similar to some of the reasons why the Open Philanthropy Project selects multiple causes (another reason, worldview diversification, is also relevant for the EA movement overall, but beyond the scope of this model).
 Like the posts on “meta traps”, I am also not arguing for less meta. We may as well have not enough meta, especially early in the movement where capacity building could be relatively more important. And this should include not only groups like 80K and GWWC that increases the number of people in the EA movements, but research groups like CEA that increase our knowledge and understanding of related issues. Although this would be the topic for another discussion.
This post was submitted for comment to 80,000 Hours before publication.