Jun 13, 2016
7 comments
Cause prioritization, or attempting to work out which cause is the most important, is widely considered an important cause in EA. But how important is cause prioritization? That is, how much should we prioritize cause prioritization within our prioritization of causes?
Act Now or Later?
EAs have argued in the past about whether it is better to “act now” or “act later”. For example, is it best to focus on growing your career capital (acting later) or on having a big impact with your current job (acting now)? Should you donate to the best charity you currently know (acting now) or should you invest your money to donate more later (acting later)?
Marginal increases in work on cause prioritization involves suspending some time that could be spent on direct causes. Therefore, cause prioritization fits into this framework as a vote for acting later, relative to doing work on an already known cause (acting now).
This would mean that similar considerations would come into play. 80,000 Hours and the Global Priorities Project summarized in “Should I Help Now or Later?” some key considerations:
The best opportunities to do good may be snapped up early, decreasing the opportunity to do good in the future. This would suggest you should focus on getting the best opportunities in whatever cause you think is best and not worry too much about cause prioritization.
Increased understanding of interventions could turn up something better than what we currently know. This would suggest cause prioritization could have a lot of value if done now.
The Value of Information
Suppose you’re hosting a corporate party in June with many important corporate bigwigs. It’s summer, and the bigwigs would prefer to be outside if you can make that happen. However, it’s a big party and if it rains, you wouldn’t be able to move it inside fast enough before people get mad. The bigwigs definitely don’t like rain. In fact, they’ve made this clear to you with the following incentive matrix:
It rains 
It does not rain 

Party is outside 
$1000 
+$500 
Party is inside 
+$0 
+$0 
So having the party inside is neutral and riskless, but having the party outside is a gamble. Using trusty expected value calculations, you calculate that you ought to host the party outside as long as $500(1p) > $1000p, where p is the chance that it rains. Using trusty math, you figure you ought to host the party outside as long as p < 1/3.
But what is p? You don’t know yet, but it rains 31% of the time in June based on historical data, so you decide it’s worth the risk, since you expect to win $35 on average (0.69*500 + 0.31*1000).
However, a magical genie approaches you and makes you a deal  he’ll tell you with absolute certainty whether it will rain or not, but it will cost you. He asks you to name a price. How much are you willing to pay to know for certain whether it will rain?
If you knew for certain it wouldn’t rain, you could gain $500. Otherwise you could host the party inside and not gain or lose anything. Based on your prior guess of a 31% chance of rain[1], you would expect to gain $500 69% of the time (because you know it won’t rain so you choose to host your event outside) and $0 the other 31% of the time (because you know it will rain so you choose to host your event inside), for an expected value of $345. Thus using the genie’s advice improves on your initial strategy by $310, so the information would be worth $310.
Why does this calculation not include any probability of hitting the $1000? Because you have perfect information (assuming you pay for the information), you’ll know whether it is raining so you can hold your event inside, avoiding the $1000 penalty. If it is raining, you’ll know it is raining, so you’ll hold your event inside ($0) and if it is not raining, you’ll know it is not raining so you’ll hold your event outside (+$500). There’s no chance of being caught off guard with perfect information. In fact, most of the value of this information comes from avoiding the sizable risk of holding an outside event when it is raining.
(For more cool problems like this where value of information calculations are necessary, I recommend “Value of Information: Four Examples”.)
Applying Value of Information to Cause Prioritization
While very oversimplified, a similar strategy could be drawn up for cause prioritization.
For this game, assume we’re a $30M foundation looking to allocate our philanthropy to either Mercy for Animals ’s online ads campaign or Against Malaria Foundation’s work to reduce malaria[2]. Imagine in this example that there’s no concern about room for more funding, so the payoff matrix looks like this:
Online ads work 
Online ads do not work 

Donate all to MFA 
Save 1,000,000 lives 
Save 0 lives 
Donate all to AMF 
Save 10,000 lives 
Save 10,000 lives 
Obviously there’s a lot of granularity to what it means that online ads “work”, there’s a lot of room to argue over how many lives would be saved by MFA even if online ads “worked”, and there’s a lot of room to quibble on the value of saving a nonhuman animal life relative to a human life, but this is a toy example so cut me some slack.
If p represents the probability that online ads work, we’d want to donate to MFA instead of AMF if 1,000,000p > 10,000. Math tells us that as long as p > 1% its worth the risk. Let’s say our prior for online ads working is 10%[3], so we’ll choose to donate to MFA. The expected value of our decision works out to 0.9*0 + 0.1*1M, saving 100,000 lives. Score! It’s great to be a multimillion dollar foundation!
This time it isn’t a genie that comes by, but a data scientist who has run ten empirical studies on online ads that are somehow perfect and they’re willing to sell the results to you. Not a big believer in open science, I guess… but either way, you’re still getting infallible truth about whether online ads actually work or not. How much is this information worth?
With perfect information, we’ll know for 100% whether or not online ads work. However, when thinking about how to value perfect information before we get perfect information, we still have to go with our prior of 10% because we don’t have anything better yet. This means we assume there’s a 10% chance that p will be 1 (online ads work) upon receiving full information and a 90% chance that p will be 0 (online ads don’t work) upon receiving full information.
If p is found to be 1 then we will decide to donate to MFA and save 1,000,000 lives. This was already our strategy, so getting perfect information would not improve it. However, if p is found to be 0, we will switch our strategy to donating to AMF, saving 10,000 lives instead of 0 (because online ads don’t work). This represents an improvement to our strategy, since we wouldn’t do this unless we had the perfect information. Since this is estimated to be the case 90% of the time, this is an expected value of an additional 9,000 lives saved.
Put mathematically, our initial strategy is expected to realize 0.9*0 + 0.1*1M, saving an expected 100,000 lives. Our strategy upon perfect information is expected to realize 0.90*10000 + 0.10*1M, or 109,000 expected lives saved. Thus the information is worth what we’d pay to save 9,000 lives, which would be $27M at the current AMF rate[4]. That data scientist is going to get rich!
Value of Imperfect Information
Of course, in real life, the data scientists are never perfect. But we can incorporate error into our calculations. Let’s say we’ve done some rigorous metaresearch and we know that there is a 95% chance the data scientist is right and a 5% chance the data scientist is wrong about whatever they are saying.
This gives the following tree:
We do not buy any information from the data scientist
Online ads work (10% chance)
1,000,000 lives saved
Online ads do not work (90% chance)
0 lives saved
We buy information from the data scientist
Online ads work (expected to be a 10% chance based on our prior)
Data scientist says online ads work (95% chance)
We fund online ads and save 1,000,000 lives
Data scientist says online ads don’t work (5% chance)
We fund AMF and save 10,000 lives
Online ads don’t work (expected to be a 90% chance based on our prior)
Data scientist says online ads work (5% chance)
We fund online ads and save 0 lives
Data scientist says online ads don’t work (95% chance)
We fund AMF and save 10,000 lives
The value of our choice to go with online ads, absent information from the data scientist, is 0.1*1M or 100,000 estimated lives saved.
The value of our strategy, if improved upon by information from the data scientist, is (0.1)(0.95)(1M) + (0.1)(0.05)(10K) + (0.9)(0.95)(10K) + (0.9)(0.05)(0) = 103,600 estimated lives saved.
Since the strategy derived from the data scientist’s information exceeds the value of our existing strategy by 3,600 estimated lives saved. Since we’d value this at $10.8M, funding the data scientist is clearly worth it even given the 5% error rate. However, notably the study is not as worthwhile as perfect information (which we valued at $27M).
The Value of a Huckster
Now another person tries to sell you information, but they’re a wellknown huckster who just makes a random guess (a 50% chance of being right)? That would give this tree:
We buy information from the huckster
Online ads work (expected to be a 10% chance based on our prior)
Huckster says online ads work (50% chance)
We fund online ads and save 1,000,000 lives
Huckster says online ads don’t work (50% chance)
We fund AMF and save 10,000 lives
Online ads don’t work (expected to be a 90% chance based on our prior)
Huckster says online ads work (50% chance)
We fund online ads and save 0 lives
Huckster says online ads don’t work (50% chance)
We fund AMF and save 10,000 lives
(0.1)(0.5)(1M) + (0.1)(0.5)(10K) + (0.9)(0.5)(0) + (0.9)(0.5)(10K) = 55,000 estimated lives saved which is much worse than our strategy of just going with online ads by default.
Why is this worse than zero value rather than just zero value? If you follow the huckster, he leads you toward the wrong conclusion (in expectation based on your prior) more often than if you had just guessed on your prior. (Of course, by luck, it could turn out that the huckster is more accurate than your prior.)
A Council of Data Scientists
But could the information always be tremendously valuable such that we keep buying information forever and never actually do anything?
Imagine that you find a second and a third data scientist who both make independent estimates from each other and from the first data scientist, and are also both 95% accurate. We decide to go with the consensus of the three data scientists. That gives the following tree:
Online ads work (expected to be a 10% chance based on our prior)
3/3 data scientists say online ads work (0.95^3 = 0.857375)
Fund online ads = 1M lives saved
2/3 data scientists say online ads work (3 * 0.95^2 * 0.05 = 0.135375)
Fund online ads = 1M lives saved
1/3 data scientists say online ads work (3 * 0.95 * 0.05^2 = 0.007125)
Fund AMF = 10K lives saved
0/3 data scientists say online ads work (0.05^3 = 0.000125)
Fund AMF = 10K lives saved
Online ads don’t work (expected to be a 90% chance based on our prior)
3/3 data scientists say online ads work (0.05^3 = 0.000125)
Fund online ads = 0 lives saved
2/3 data scientists say online ads work (3 * 0.95 * 0.05^2 = 0.007125)
Fund online ads = 0 lives saved
1/3 data scientists say online ads work (3 * 0.95^2 * 0.05 = 0.135375)
Fund AMF = 10K lives saved
0/3 data scientists say online ads work (0.95^3 = 0.857375)
Fund AMF = 10K lives saved
This tree is thus worth 108,217 lives saved[5], which is an improvement of 8217 expected lives saved over our original strategy and an improvement of 4617 expected lives saved over going with just one data scientist.
We can then compare strategies:
Strategy 
Expected Value 
Improvement 
Improv./DS 
0 data scientists (default) 
100,000 
0 
N/A 
1 data scientist 
103,600 
3600 
3600 
3 data scientists 
108,217 
8217 
2739 
5 data scientists[6] 
108,874.92 
8874.92 
1774.984 
Perfect information 
109,000 
9000 
N/A 
This shows there’s clear diminishing marginal returns to hiring additional data scientists (increasing accuracy in your information), asymptotically approaching the value of perfect information. Eventually there would be some point where hiring more data scientists isn’t worth it[7].
The Simple Formula for Valuing Cause Prioritization
We can take a step back and think about this more generally. As we see above, the information you get is valuable only when it lets you change your decision when you’re wrong. It doesn’t matter if you imperfectly happen to choose the best strategy and then the perfect information you get confirms you’re right. We can thus state:
Expected Value of a particular cause prioritization effort ~= probability of changing decision on something * value of that changed decision
Similarly…
Cost of particular cause prioritization effort = amount of time and money to acquire the desired information
And thus we should pursue a particular cause prioritization effort when the value exceeds the costs.
Getting Less Naïve
However, we have to keep in mind some additional wrinkles that could make cause prioritization less valuable:
There is value of information from acting. While there likely isn’t that much to learn from implementing more marginal online ads, there could be a lot of value in implementing something more novel (e.g., a Charity Entrepreneurship pilot project) before studying, since we’d likely learn a lot about the intervention and expose a lot of unknown unknowns by seeing it in the field.
There is a cost of delay of acting on the best guess. If the study of online ads takes a year to implement and we don’t fund any online ads during that year, we potentially lose out on whatever value creating vegetarians a year earlier would have (assuming online ads work). Perhaps we could adjust for this by making use of present value calculations on the benefits achieved after the study is run.
There may not be better giving opportunities to find. The value of information only comes from successfully changing our decision, and higher value is realized by changing our decision to something that is much better. However, if we have an assumption that good giving opportunities are diminishing in value or that all the highleverage opportunities have already been found, this could lead us to think there are not that many good opportunities left for better information to find.
The world could change after the study is run. Even if the study was perfect, underlying conditions could change as to render the information incorrect or irrelevant. For a more extreme example, even if online ads were proven to work, Facebook could end up banning online ads for animal rights organizations.
There is value in sharing your information with others. 80K argues in “The Value of Coordination” that EA is best thought of as a multiplayer game with strong value in sharing information. Thus any information you gather should be valued not just for how it can affect your own decisions, but in how it can affect the decisions of others that your information could potentially influence.
Endnotes
[1]: This prior is given as an explicit point estimate. To be more accurate, we should express our prior as a probability distribution with a certain mean and standard deviation. We would then receive new information as a modified distribution with a mean and standard deviation and then we would perform a Bayesian update between the prior distribution and the evidence distribution to get a posterior distribution. However, this more complex math is beyond the scope of this essay.
[2]: I suppose this could be roughly taken to represent the situation OpenPhil is in.
[3]: This number is completely made up.
[4]: The value of the information here is actually a bit complex because you don’t really know your money to lives conversion until you actually get the information, since it is dependent upon the best intervention. Your best bet is to use your prior between the two, with a 90% chance of AMF being the best and a 10% chance of MFA being the best, for a weighted average of 0.9*$3000 + 0.1*$30 = $2703 per life saved, making the information worth ~$24.3M.
[5]: (0.1)(0.857375)(1M) + (0.1)(0.135375)(1M) + (0.1)(0.007125)(10K) + (0.1)(0.000125)(10K) + (0.9)(0.00125)(0) + (0.9)(0.007125)(0) + (0.9)(0.135375)(10K) + (0.9)(0.857375)(10K) =
(0.1)(0.857375 + 0.135375)(1M) + (0.1)(0.007125 + 0.000125)(10K) + (0.9)(0.000125 + 0.007125)(0) + (0.9)(0.135375 + 0.857375)(10K) =
(0.1)(0.99275)(1M) + (0.1)(0.00725)(10K) + (0.9)(0.00725)(0) + (0.9)(0.99275)(10K) =
(0.099275)(1M) + 0.000725(10K) + 0.893475(10K) =
(0.099275)(1M) + (0.000725 + 0.893475)(10K) =
(0.099275)(1M) + (0.8942)(10K) = 108,217
[6]: For further demonstration and discussion, here is a tree with five data scientists:
Online ads work (expected to be a 10% chance based on our prior)
5/5 data scientists say online ads work (0.95^5 = 0.773780938)
Fund online ads = 1M lives saved
4/5 data scientists say online ads work (5 * 0.95^4 * 0.05 = 0.203626562)
Fund online ads = 1M lives saved
Online ads don’t work (expected to be a 90% chance based on our prior)
5/5 data scientists say online ads work (0.05^5 = 0.000000312)
Fund online ads = 0 lives saved
4/5 data scientists say online ads work (5 * 0.95 * 0.05^4 = 0.000029688)
Fund online ads = 0 lives saved
(0.1)(0.773780938)(1M) + (0.1)(0.203626562)(1M) + (0.1)(0.021434375)(1M) + (0.1)(0.001128125)(10K) + (0.1)(0.000029688)(10K) + (0.1)(0.000000312)(10K) + (0.9)(0.000000312)(0) + (0.9)(0.000029688)(0) + (0.9)(0.001128125)(0) + (0.9)(0.021434375)(10K) + (0.9)(0.203626562)(10K) + (0.9)(0.773780938)(10K) =
(0.1)(0.773780938 + 0.203626562 + 0.021434375)(1M) + (0.1)(0.001128125 + 0.000029688 + 0.000000312)(10K) + (0.9)(0.021434375 + 0.203626562 + 0.773780938)(10K) =
(0.1)(0.998841875)(1M) + (0.1)(0.001158125)(10K) + (0.9)(0.998841875)(10K) =
0.099884188(1M) + 0.000115812(10K) + 0.898957688(10K) =
(0.099884188)(1M) + (0.000115812 + 0.898957688)(10K) =
(0.099884188)(1M) + (0.8990735)(10K) = 108,874.92
[7]: Another approach would be to hire a single data scientist, see what they say, and then recalculate whether you should hire a second data scientist. Figuring out how many scientists to hire in advance is akin to increasing the accuracy of a particular experiment, whereas a “wait and see” approach is more akin to figuring out how many experiments to run. You’d then get a chance to adjust your prior between steps and recalculate. But this is also beyond the scope of this essay.