A General Treatment of the Moral Value of Information

SamNolan

tldr: You can generalize the Moral Value of Information to a neat and simple formula for easy (even mental) calculation. This can also be matched with Bayesian reasoning to work out the value of information for any given trial. This article is mainly the mathematical derivation of these concepts. In this very article we find that the moral value of information is proportional to the chance that you will change your actions for the better because of it. Let that be a warning! If you are not interested in calculating the moral value of information for good, or changing the way you do things from reading this article, then the moral value of this information is 0!

The really quick and easy formula is as follows:

Where $E (T)$ is the marginal expected value of the trial, $n$ is the amount of times the information is being used to produce a benefit, $p$ is the probability that you will change your mind given the results of the trial, $e$ is the marginal value in changing your mind from the results of the trial, and $c$ is the cost of the trial. However, it's very dangerous to go around applying formulas you don't understand, so read on if you want to actually use it. If you don't understand the concepts after reading this article, please comment and I'll try and clarify as much as I can.

I, among many others, was incredibly fascinated by Amanda Askell's Moral Value of Information. Considering that a large amount of Effective Altruism requires the gaining of information, and I would love to know whether it's valuable to gain more information in some areas, I became interested in trying to generalize the concepts within her presentation into probability theory. I do not have access to the original article on this, so I'm unaware if this this style of treatment has already been done, but nevertheless I thought it was a fascinating exercise in probability. This hopefully should give some insights in when we should be gathering information about different interventions.

To recap the concepts in Amanda Askell's presentation, she shows that sometimes it's more valuable to gain information about interventions. Particularly, it could be more important to gain information about interventions that we are not familiar with.

The example that she gives is slightly more complicated than what is actually required for this article. So I'll give a simpler example. We have a disease, and there are now two drugs for that disease. I will call them drug A and drug B:

	Cost	Effectiveness	Expected Value
Drug A	$10	10 years	1 year / $
Drug B	$10	12 years	1.2 years / $

In this case, drug A is very well known to increase the lifespan of a patient by 10 years. However, drug B has not been tested as much, but due to some initial trials its expected value is assumed to be more effective that drug A.

If a doctor is looking to prescribe a drug, then following simple expected value calculations would lead them to prescribe drug B. It should be noted that before anyone gets up in arms about not applying a prior, these effectiveness calculations are based off the posterior. The prior has already been applied. This might be a bit unrealistic in an actual medical scenario, doctors would probably prefer prescribing a drug that has more evidence, but we'll assume this for the sake of the exercise.

Now, if we have the opportunity to do a trial for drug B, and this trial has the following possible outcomes, each with probability 0.5:

Low Effect

(0.5 probability)

High Effect

(0.5 probability)

Drug B

2 years

20 years

This would mean that we would get the following expected values depending on whether the trial showed a high or low effect:

	Low effect	High Effect
No Trial	2 years/$	20 years/$
Trial	10 years/$	20 years/$

It should be noted that for No Trial - Low Effect, No Trial - High Effect and Trial - High Effect, drug B is being prescribed. However, in the case of Trial - Low Effect, then the doctor would opt for drug A, finding that this is the better option.

Take a while to digest this, it shows that using expected value alone, there is value gained from information. Particularly, it was important to invest in the less certain drug trials. If need be, check out Amanda Askell's talk on this.

This is fascinating, but I believe it's only really the beginning of this type of discussion. There's a few of things I would like to resolve before I can feel like I understand this concept:

How can I calculate the expected value of a trial? Is there a formulaic way to do it? When does the amount of money spent on a trial become not worth the effort?
Most of the time, uncertain variables follow continuous distributions. How would this be generalized to describe continuous uncertainty?
Pretty much no trials completely determine the effectiveness of an intervention with complete certainty. Is there a way that we can, from the power of a study alone, work out how much expected value comes from the trial?

We'll work our way up in generality, going from a simple discrete formula of the given example, into a general method for making these calculations regardless of the trial and variables.

Formula for the Moral Value of Information

One thing that's very important to realize about this argument is that it is entirely dependent on having more than one intervention. If you only have one option of drugs to prescribe, you may as well just prescribe it, no matter what information you have on it. The ability to gain that moral value is dependent on the ability to choose between the interventions. This should be pretty obvious, but when I first cracked my head against this problem, I thought there was inherit moral value in decreasing the variance of the posterior. In this argument, there certainly is not.

I'll copy the table above down here again, we'll be referencing it to construct a formula:

	Low effect 0.5 probability	High Effect 0.5 probability
No Trial	2 years/$	20 years/$
Trial	10 years/$	20 years/$

The expected value of not having a trial is in the above example is $0.5 \times 2 + 0.5 \times 20 = 11$ whereas the expected value of having a trial is $0.5 \times 10 + 0.5 \times 20 = 15$ . This makes a marginal expected value of 4 years/prescribed person. So if we multiply this value by the number of prescribed people, and divide it by the cost, we would end up with the equation:

E (T) = n \frac{(p a + (1 - p) b_{h}) - (p b_{l} + (1 - p) b_{h})}{c} = n p \frac{a - b_{l}}{c}

Where $E (T)$ is the expected value of the trial, $n$ is the number of prescriptions made from that information, $c$ is the cost of the test, $p$ is the probability that the test would give a low result (and $1 - p$ the probability of a high result), $a$ is the expected value of Drug A, $b_{l}$ is the possible low value of drug B and $b_{h}$ is the possible high value of drug B.

There's a large amount of variables here. Thankfully a lot of them cancel out. Take a while to digest this before moving on. If you're having difficulty here then you might have trouble with the later parts.

Interestingly, the value of the trial is not at all dependent on the high value for drug B, that ends up simply canceling out. In the end, it depends on the difference between $a$ and $b_{l}$ , and how probable it is that A is actually the better choice. You can actually think about this as the value of changing your mind for the better, multiplied by the chance of changing your mind.

However, as much as this is interesting, real trials are not usually that simple, and most variables when discussing interventions are continuous, so it's worth extending our understanding in that direction.

Continuous Moral Value of Information

In this version we have two random variables, $A$ and $B$ , that represent the expected value of drugs A and B respectively. The expected value of drug A in this case is $E (A)$ .

We now have a trial that finds out the true value of $B$ , we'll call this true value $b$ . It's a perfect trial, and gives an exact value for what $b$ actually is. This $b$ is sampled from $B$ .

So our question is, what is the value of performing this trial?

To answer this question, we need to, as above, work out what results from different values of $b$ . In this case, if $b > E (A)$ , the trial found that drug B will still be the better drug and so $b$ units of value would be received from prescribing it. If however, $b < E (A)$ , then drug A would have actually been the better option, giving $E (A)$ units of value.

This means that no matter what the results of the trial are, it's impossible to end up prescribing something with value that's less than $E (A)$ . The value of what we are prescribing is:

m a x (E (A), b)

You can imagine (or not imagine) having the parts of $B$ that are below $E (A)$ not possible. This (sort of) creates a truncated distribution, which means that the has probability $0$ for all $x < E (A)$ , except there is a big lump of probability mass at $E (A)$ . It otherwise follows the follows the distribution of $B$ . Because all probability mass in $B$ that was below $E (A)$ is lumped at $E (A)$ , the expected value is larger than just prescribing drug B as is.

The formula to calculate this expected value of the trial now would be

E (A) P r (B < E (A)) + E (B | B > E (A)) P r (B > E (A))

Don't worry, it does get simpler. The left term represents the value if A is the better drug, multiplied by it's probability, and the right term is the value if B was the better drug, multiplied by it's probability.

Because we are currently prescribing B, the expected value of ignorance is $E (B)$

This gives the formula for marginal expected value:

E (T) = n \frac{E (A) P r (B < E (A)) + E (B | B > E (A)) P r (B > E (A)) - E (B)}{c}

E (T) = n P r (B < E (A)) \frac{E (A) - E (B | B < E (A))}{c}

Math Notes: I'm skipping over a few steps in derivations here. If you're interested in seeing the full derivations, ask in the comments. Furthermore, $n P r ()$ does not represent the combinatorics function, but is instead represents $n \times P r ()$ .

Wow! That looks a lot like our discrete case. As a final exposition on this topic, we have assumed that B, the drug with the larger expected value, is being prescribed and researched. There's one more case we need to consider, that is where $E (B) < E (A)$ and we are researching the less effective intervention. This case is identical to the above except that drug A is being prescribed by default, so the value of ignorance is $E (A)$ . This gives:

E (T) = n \frac{E (A) P r (B < E (A)) + E (B | B > E (A)) P r (B > E (A)) - E (A)}{c}

E (T) = n P r (B > E (A)) \frac{E (B | B > E (A)) - E (A)}{c}

These two equations look incredibly similar to our discrete version, and seems to give a more general formula that can be used without that much mental effort:

E (T) = \frac{n p e}{c}

Where $p$ is the probability that you will change your mind, $e$ is the marginal expected value of changing your mind given that it is actually the better option, and $n$ and $c$ are the number of times this information is used and the cost of the trial respectively.

I created a desmos graphics calculator version of this. You can play around for different information for each of the drugs to determine the different moral values of the information.

This creates some joyful insights into the moral value of information:

The expected value is proportional to the chance that you will change your mind. This means that information is not valuable if it does not change your actions or is unlikely to change your actions. Or to quote Stephen R Covey, "to know and not to do, is really not to know". A nice little life lesson hidden in the math.
Moral value of information is also proportional to the marginal impact of that information given that it's the better option. This means that heavy tailed distributions, those where if right have a much larger expected value than the alternative, have a much larger value of information.

This equips us with more than enough information to tackle the Bayesian case!

Value of Bayesian Moral Information

Epistemic status: I'm not as certain about my derivations and claims in this section as I am in the previous ones. Takes this with a grain of salt. If you truly are an incredible person, check my math!

There is pretty much never a case where the trial is so perfect it tells you exactly what B actually is. In this version, we find the value of Moral Information when our trial's only give us more information about what B is.

In a Bayesian case, we need to ask whether the evidence would change our mind. This means we need to know what's the probability of getting results that are extreme enough to have the expected value of the posterior lower than $E (A)$ . This has the intuitive consequence that low powered trials are less valuable, because the chance that you will change your mind from the result of a low powered trial is lower. In this, we will consider the standard deviation ( $σ$ ) of the likelihood to be the power of the experiment. The lower the $σ$ , the higher the power.

I don't believe that this is possible to analytically solve for in the general case. However, we can do this if we assume two normal distributions for A and B and a normal likelihood, then use conjugate priors.

If $x$ represents the result that we would get from the trial, we need to work out what value of $x$ would change our mind. I'll call this threshold value $x_{0}$ . This threshold would cause the posterior mean to equal $E (A)$ , so, using the normal conjugate with a known $σ$ :

E (A) = \frac{1}{\frac{1}{σ_{0}^{2}} + \frac{1}{σ^{2}}} (\frac{μ_{0}}{σ_{0}^{2}} + \frac{x_{0}}{σ^{2}})

x_{0} = (E (A) - μ_{0}) \frac{σ^{2}}{σ_{0}^{2}} + E (A)

Math note: These formulas are from the wikipedia page on conjugate priors. Again, I'm skipping a lot of steps here. Please comment if you want to see them.

Where $σ_{0}$ is the prior's standard deviation, and $μ_{0}$ is the prior's mean.

I don't believe I can analytically solve for the probability that $x$ is more extreme than this number, so I created a desmos calculator version of it as well. From this, you can gain another few insights:

If the $E (B)$ is much larger than $E (A)$ , then there is a low chance you will change your mind, and the value of information goes down. That is to say, if we're already pretty sure B is better than A, there's no little value in researching B or A.
If the variance of $B$ is already pretty low, there is less value of information. That is to say it's more useful to get information about things we don't know about.

This was a very fascinating exposition of the concept. I'd be interested in seeing this type of thing value the work of research. I'm definitely interested in giving it a go with my research tasks.

Effective Altruism Forum
EA Forum

A General Treatment of the Moral Value of Information

16

Formula for the Moral Value of Information

Continuous Moral Value of Information

Value of Bayesian Moral Information

16

Reactions

More posts like this