Hide table of contents

Below is an article based on a transcript of "The Moral Value of Information", a popular 2017 talk by the NYU philosopher Amanda Askell. She argues that we often underestimate the value of new information or knowledge when thinking of how to do good.

James Aung and Jessica Wong created this heavily edited version of the original transcript in order to condense, clarify, and clean up the argument.

The Talk

I'm going to start by making two claims. The first claim is that we generally prefer interventions with more evidential support, all else being equal. I hope you find that claim plausible, and I’ll go into detail later about what it means. The second claim I’m going to argue for is that having less evidence in favor of a given intervention means that your credences about the effectiveness of that intervention are what I call “low resilience”.

This second claim has been explored in decision theory to some extent. It holds even if your credences about the effectiveness of that intervention are the same value. So, if I thought there was a 50% chance that I would get $100, there’s a difference between a low resilience 50% and a high resilience 50%.

I’m going to argue that, if your credences in a particular domain are low resilience, then the value of information in this domain is generally higher than it would be in a domain where your credences are high resilience. And, I’m going to argue that this means in many cases, we should prefer interventions with less evidential support, all else being equal. Hopefully, you’ll find that conclusion counterintuitive and interesting.

The first thing to say is that we generally think that expected value calculations are a good way of estimating the effectiveness of a given intervention. For example, let's imagine that there are two diseases: (very novelly named) Disease A and Disease B [Figure 1].

Say these two diseases are virtually impossible to differentiate. They both have the same symptoms, and they cause the same reduction in life expectancy, etc. Their key difference is that they respond very differently to different treatments, so any doctor who finds themselves with a patient with one of these conditions is in a difficult situation.

They can prescribe Drug A which costs $100. If the patient has Disease A, then Drug A will extend their life by another 10 years. If, on the other hand, the patient had Disease B, it won’t extend their life at all. They will die of Disease B, because Disease B is completely non-responsive to Drug A. Therefore, the expected years of life that we get from Drug A is 0.05 per dollar. Drug B works in a very similar way, except it is used to treat Disease B. If you have Disease A, it will be completely non-responsive. So, it’s got the same expected value as Drug A.

Then, we have Drug C, which also costs $100. Both Disease A and Disease B are somewhat responsive to Drug C, so this is a new and interesting drug. From Figure 1 we can see the expected value for Drug C is greater than the expected value for either Drug A or Drug B. So we think, “Okay, great. Kind of obvious we should prescribe Drug C.”

Suppose that Drug A and Drug B have been already heavily tested in numerous trials, and they’ve been shown in meta-analyses to be highly effective, and that the estimates in Figure 1 are extremely robust. Drug C, on the hand, is completely new. It has only had a single trial, in which it increased patients’ lives by many years. We assume that this was in a trial of patients with both diseases.

Say you have a conservative prior about the effectiveness of a drug; you think, “In likelihood, most random drugs that we were to select would either be net neutral or net negative”. If you see one trial in which a drug massively extends someone’s life, then your prior might bring you down to something like six years, regardless of whether they have Disease A or Disease B. We have the same expectation for Drug C as before, but suddenly it seems a bit more questionable whether we should prescribe it.

This idea that we should favor interventions with more evidence, and that expected utility theory can’t capture this, is summed up in this blog post from GiveWell from 2016.

“There seems to be nothing in explicit expected value that penalizes relative ignorance, or relatively pearly grounded estimates. If I can literally save a child I see drowning by ruining a $1,000 suit, but in the same moment that I make a wild guess that this $1,000 could save two lives if I put it toward medical research, then explicit expected value seems to indicate that I should opt for the latter.”

The idea is that there’s something wrong with expected value calculations because they kind of tell us to take wild guesses, as long as the expected value is higher. I want to argue that there are two claims that we might want to vindicate in these sorts of cases.

The first claim is one that I and hopefully you find quite plausible, and it’s the claim that evidence matters. How much evidence we have about an intervention can make a difference in deciding what we should do.

The second claim is one that I think is implied by the above quote, which is that we should favor more evidence, all else being equal. So, if the expected value of two interventions is similar, we should generally favor investing in interventions that have more evidence supporting them.

Maybe we can say that this is relevantly similar to the case we have with Drugs A, B, and C. In a case where you have a lot of evidence that Drug A and Drug B have the effects as in Figure 1, this might favor giving one of these well-known drugs over a new one, such as Drug C, that has only been shown in one trial to be effective.

I’m going to consider both of these claims, and whether expected value calculations can vindicate either or both of them. As mentioned at the beginning, I’m going to argue that it can support the first claim (that evidence matters) but that it actually rejects the second claim (that we should favor interventions with more evidence, all else being equal).

Let's begin. I want to turn to this notion of resilience, and how we represent how much evidence we have, in terms of the credences we assign to propositions such as “This drug will cure this disease.”

See Figure 2. Take Case 1, which is an untested coin. I’ve given you no information about how biased this coin is. It could be completely biased in favor of heads, it could be completely biased in favor of tails, or it could be a completely fair coin. You have no information to distinguish between any of these hypotheses. It seems like, in this case where you have no idea about what the bias of a coin is, if I were to ask you, “What is the chance it lands heads on the next throw?”, you’re going to have to reply, “It’s about 50%", because you have no reason to favor a heads bias over a tails bias.

Now consider a different case, which is Case 2, the well-tested coin. When you flip the well-tested coin you get the following sequence: “Heads, heads, heads, tails, heads, heads, tails, tails,” until the coin has been flipped a million times. You had a very, very boring series of days flipping this coin.

For Case 1, in answer to the question “What’s the probability that the coin will land heads in the next flip?” you should say, "0.5 or 50%.” In Case 2, where you tested the coin a lot and it’s come up heads roughly 50% of the time, tails roughly 50% of the time, you should also say that the next flip is 50% likely to be heads.

The difference between the untested coin and the well-tested coin cases is reflected in the resilience levels of your credences. One kind of simple formulation of resilience, the credo-resilience, is how stable you expect your credences to be in response to new evidence. If my credences are high resilience, then there’s more stability. I don’t expect them to vary that much as new evidence comes in, even if the evidence is good and pertinent to the question. If they’re low resilience, then they have low stability. I expect them to change a little in response to new evidence. This is the case with the untested coin, where I have no data about how good it is, so the resilience of my credence of 50% is fairly low.

It’s worth noting that resilience levels can reflect either the set of evidence that you have about a proposition, or your prior about the proposition. For example, if you saw me simply pick the untested coin up out of a stack of otherwise fair coins, you would have evidence that it’s fair. But if you simply live in a world that doesn’t include a lot of very biased coins, then your prior might be doing a lot of the work that your evidence would otherwise do. These are the two things that generate credo-resilience.

In both cases with the coins, your credence that the coin will land heads on the next flip is the same — it’s 0.5. Your credence of 0.5 about the tested coin is resilient, because you’ve done a million trials of this coin. Whereas, your credence about the untested coin is quite fragile. It could easily move in response to new evidence, as we can see in Figure 3.

Take this third case. You start to test the untested coin, so you perform a series of flips with the coin, and you start to see a pattern. In a case like this, it looks like the coin is pretty heavily heads biased, or you at least start to quite rapidly increase your credence that it’s heads biased. As a result, your credence that it’s going to come up heads next time is much higher. Because you had less evidence about the untested coin, your credence of 0.5 in it landing heads was much more fragile, and now your credence changes in response to evidence.

You wouldn't change your credence if you got this sequence of all-heads on the well-tested coin, because more evidence means that your credences are more resilient. If you saw a series of five heads after performing a million trials where it landed heads roughly half the time, this all-heads sequence is just not going to make a huge difference to what you expect the next coin flip to be.

I think credo-resilience has some interesting implications. A lot of people seem to be kind of unwilling to assert probability estimates about whether something is going to work or not. I think a really good explanation for this is that, in cases where we don’t have a lot of evidence, we have low credence in how good our credences are.

In essence, we think it’s very likely that our credences are going to move around a lot in response to new evidence. We’re not willing to assert a credence that we think is simply going to be false or inaccurate as soon as we gain a little bit more evidence. Sometimes people think you have mushy credences, that you don’t actually have precise probabilities that you can assign to claims such as “This intervention is effective to Degree N.” I actually think resilience might be a good way of explaining that away, by instead claiming, “No. You can have really precise estimates. You just aren’t willing to assert them.”

This has a huge influence on the value of information, which is the theme of this piece. Our drugs scenario is supposed to be somewhat analogous to different altruistic interventions.

In the original case, we had the following kind of scenario, where we had expected 0.05, 0.05, and 0.06 for the three drugs. Of course, one thing that we can do here is gain valuable evidence about the world. Consider this case, where diagnosis is invented, at least as far as Disease A and Disease B are concerned. So, we can now diagnose whether you have Disease A or Disease B, and it costs 60 additional dollars to do so. Given this, if I diagnose you, then I can expect that conditional on diagnosis, if you have Disease A, you will live for 10 years, because I will be able to then pay an additional $100 to give you Drug A. If you have Disease B, I’ll be able to pay an additional $100 to get you Drug B.

In this case, the value of diagnosis, including the cost of then curing you of the disease, is higher than any of the original interventions. Hopefully, it is intuitive that, rather than giving you Drug A, Drug B, Drug C, I should diagnose you and give you the correct drug.

The previous example was concerned with information about the world, which we think is valuable anyway. Now suppose I care about global poverty and I want to discover good interventions. To find out more about the world, I could, for example, research deficiencies that exist in India and see if there are good ways to improve them.

A different way we can gain valuable information is by finding out about interventions themselves [Figure 5]. An example would be to look at the actual intervention of Drug C and how effective it is.

Suppose, in our ideal world, that it costs $5,000 to run a trial of Drug C that would bring you to certainty about its effectiveness. Suppose, also, that somehow you know it can only be very low impact or very high impact. You have a credence of about 0.5 that the results will show Drug C only extends life by two years in both Disease A and Disease B. This is our skeptical prior. But you also have a credence of about 0.5 that the drug will extend life by 10 years in both cases. We will also assume diagnosis has gone out the window.

Now imagine the scenario where you are currently prescribing Drug C. You obviously don't exist in any modern medical system, since you're ignoring the fact that there is low evidence here and you're going with the expected value as is. Then the question is "What is the value of doing a trial of Drug C, especially given that you're already prescribing Drug C?". If your credence of low impact goes to 1 — you suddenly discover that this drug is much less effective than you initially thought — then you're going to switch from Drug C to prescribing Drug A or B again.

The per-patient benefit is going to go from two to five years of expected life in this case where you've run the trial. Whereas, if Drug C is a low-impact drug and you don't perform the trial, you don't spend the $5,000, but you will only get two years of additional life per $100. Every time you see Disease A and Disease B you will continue to prescribe Drug C and it will only give people an additional two years of life. Alternatively, if it is high-impact, then you will have been accidentally prescribing something that is very good and providing ten years of additional life.

We can see the trial adds 1.5 years of expected life per future treatment. Therefore, if there are more than 2,000 patients, the value of investing in a trial of Drug C is better than giving any of the drugs currently present. In this case the information value swamps the direct value of intervention here.

The value of investing in trials of Drug A or B is going to be negligible because credences about their effectiveness are already resilient. This builds up to the rather intrusive conclusion that expected value calculations might say that in cases where all else is equal, we should favour investing in interventions that have less evidence, rather than interventions that have more.

This means, if the expected concrete value of two interventions is similar, we should generally favor investing in interventions that have less evidence supporting them. I'm going to use concrete value to mean "non-informational value". The idea here is that in such scenarios, the concrete values for interventions are the same, but the information value for one of them is much higher, namely the one where you have a much lower resilience credence generating your expected value calculation.

This gets us to the "What does it mean, and what should we do" part. Hopefully I have convinced you that, despite the fact that it was an intuitive proposition that we should favor things with more evidence, there is actually some argument that we should favor things that have less evidence.

When considering information value, there are three options available to us: "explore, exploit or evade" [Figure 7].

We can choose to explore; we can invest resources in interventions primarily for their information value. This means things like research, funding to gather data, and career trials. We can exploit, which means investing resources in an intervention for its concrete value. That means things like large project grants and entire career choices. Or we can evade; we can decide just not to invest in a given intervention — we either invest elsewhere, or completely delay investment.

The main difference between these three options is the reason for action. Take three people. Amy donates $100 to an existential risk charity to protect the future of humanity. She is just exploiting the value and just looks at the direct concrete value of this intervention.

Bella donates $100 to the same charity to find out how much good they can do in the world. She is donating mainly to explore and then later exploit. She'll think to herself, "Let's see how valuable this is", and if it is very valuable, then she will mine the value of it.

Carla donates $100 to the same charity, but for the reason of getting humanity to have more time to discover what the best causes are. She is exploiting the direct value the charity does in reducing existential risk in order to have more time to discover what the best causes are and then exploit those. In essence, she is exploiting to explore to exploit.

When is exploring especially cost effective?

Essentially, when there are three features:

  1. When there is more uncertainty about the direct value of an intervention. This means options that have high expected value, but low resilience.
  2. When there are high benefits of certainty about the direct value. We would then be able to repeatedly mine it for value.
  3. When there are low information costs. This means the information is not too costly to obtain and the delay is low cost (you don't want to be looking for information when cars are driving towards you, as the cost of not taking action and getting out the way is high!).

The question I have is: "Is gaining information especially valuable for effective altruists?" Another way to put it is: "Is information essentially its own cause area, within effective altruism?"

There is a lot of uncertainty within and across good cause areas, particularly if we consider long-term indirect effects. We don't know about the long-term indirect effects of a lot of our interventions.

The benefits of certainty are high, as we expect to use this information in the long term. Effective altruism is a multi-generational project as opposed to a short-term intervention. So, you expect the value of information to be higher, because people can explore for longer and find optimal interventions.

To some degree there are low information costs while the movement is young and there is still a lot of low-hanging fruit. This comes with caveats. Maybe you're a bit like Carla and you're very worried that we're screwing up the climate, or that nuclear war is going to go terribly wrong, in which case, maybe you think we should be directly intervening in those areas.

What difference would exploring more make to effective altruism?

I think we could probably invest a lot more time and resources in interventions that are plausibly good, in order to get more evidence about them. We should probably do more research, although I realise this point is somewhat self-serving. For larger donors, this probably means diversifying their giving more if the value of information diminishes steeply enough, which I think might be the case.

Psychologically, I think we should be a bit more resilient to failure and change. When people consider the idea that they might be giving to cause areas that could turn out to be completely fruitless, I think they find it psychologically difficult. In some ways, just thinking, "Look, I'm just exploring this to get the information about how good it is, and if it's bad, I'll just change" or "If it doesn't do as well as I thought, I'll just change" can be quite comforting if you worry about these things.

The extreme view that you could have is "We should just start investing time and money in interventions with high expected value, but little or no evidential support." A more modest proposal, that I tentatively endorse, is "We should probably start explicitly including the value of information, and assessment of causes and interventions, rather than treating it as an afterthought to concrete value." In my experience, information value can swamp concrete value; and if that is the case, it really shouldn't be an afterthought. Instead it should be one of the primary drivers of values, not an afterthought in your calculation summary.

In summary, evidence does make a difference to expected value calculations via the value of information. If the expected concrete value to interventions is the same, this will favour testing out the intervention with less evidential support, rather than the one with more. Taking value of information seriously would change what effective altruists invest their resources in, be it time or money.

Q&A

Question: What does it mean to have credence in a credence — for example, maybe an 80% chance that it has 50% chance of it working, etc., etc.? Does it recourse down to zero?

Amanda Askell: It's not that you have a credence, but your credence in your credence being the same or changing in response to new evidence. There are a lot of related concepts here. There are things like your credence about the accuracy of your credence. So, it's not "I have a credence that I have a credence of 0.8." This is a separate thing — my credence that in response to this trial, I will adjust my credence from 0.5 to either 0.7 or 0.2 is the kind of credence that I'm talking about.

Question: Do you think there's a way to avoid falling into the rabbit hole of the nesting credences of the kind that the person might have been referring to?

Amanda Askell: I guess my view, in the boring philosophical jargon, is that credences are dispositional. So, I do think that you probably have credences over infinitely many propositions. I mean, if I actually ask you about the proposition, you'll give me an answer. So, this is a really boring kind of answer, which is to say, "No, the rabbit hole totally exists and I just try and get away from it by giving you a weird non-psychological account of credences."

Question: Is information about the resilience captured by a full description of your current credences across the hypothesis space? If not, is there a parsimonious way to convey the extra information about resilience?

Amanda Askell: I'm trying to think about the best way of parsing that. Let's imagine that I'm just asking your credence. I say that the intervention has value N, for each N I'm considering. That will not capture the resilience of your credence, because it's going to be how you think that's going to adjust in response to a new state. If you include how things are going to adjust in response to a new state, then yes, that should cover resilience. So it just depends on how you're carving up the space.

Comments


No comments on this post yet.
Be the first to respond.
Curated and popular this week
jackva
 ·  · 3m read
 · 
 [Edits on March 10th for clarity, two sub-sections added] Watching what is happening in the world -- with lots of renegotiation of institutional norms within Western democracies and a parallel fracturing of the post-WW2 institutional order -- I do think we, as a community, should more seriously question our priors on the relative value of surgical/targeted and broad system-level interventions. Speaking somewhat roughly, with EA as a movement coming of age in an era where democratic institutions and the rule-based international order were not fundamentally questioned, it seems easy to underestimate how much the world is currently changing and how much riskier a world of stronger institutional and democratic backsliding and weakened international norms might be. Of course, working on these issues might be intractable and possibly there's nothing highly effective for EAs to do on the margin given much attention to these issues from society at large. So, I am not here to confidently state we should be working on these issues more. But I do think in a situation of more downside risk with regards to broad system-level changes and significantly more fluidity, it seems at least worth rigorously asking whether we should shift more attention to work that is less surgical (working on specific risks) and more systemic (working on institutional quality, indirect risk factors, etc.). While there have been many posts along those lines over the past months and there are of course some EA organizations working on these issues, it stil appears like a niche focus in the community and none of the major EA and EA-adjacent orgs (including the one I work for, though I am writing this in a personal capacity) seem to have taken it up as a serious focus and I worry it might be due to baked-in assumptions about the relative value of such work that are outdated in a time where the importance of systemic work has changed in the face of greater threat and fluidity. When the world seems to
 ·  · 4m read
 · 
Forethought[1] is a new AI macrostrategy research group cofounded by Max Dalton, Will MacAskill, Tom Davidson, and Amrit Sidhu-Brar. We are trying to figure out how to navigate the (potentially rapid) transition to a world with superintelligent AI systems. We aim to tackle the most important questions we can find, unrestricted by the current Overton window. More details on our website. Why we exist We think that AGI might come soon (say, modal timelines to mostly-automated AI R&D in the next 2-8 years), and might significantly accelerate technological progress, leading to many different challenges. We don’t yet have a good understanding of what this change might look like or how to navigate it. Society is not prepared. Moreover, we want the world to not just avoid catastrophe: we want to reach a really great future. We think about what this might be like (incorporating moral uncertainty), and what we can do, now, to build towards a good future. Like all projects, this started out with a plethora of Google docs. We ran a series of seminars to explore the ideas further, and that cascaded into an organization. This area of work feels to us like the early days of EA: we’re exploring unusual, neglected ideas, and finding research progress surprisingly tractable. And while we start out with (literally) galaxy-brained schemes, they often ground out into fairly specific and concrete ideas about what should happen next. Of course, we’re bringing principles like scope sensitivity, impartiality, etc to our thinking, and we think that these issues urgently need more morally dedicated and thoughtful people working on them. Research Research agendas We are currently pursuing the following perspectives: * Preparing for the intelligence explosion: If AI drives explosive growth there will be an enormous number of challenges we have to face. In addition to misalignment risk and biorisk, this potentially includes: how to govern the development of new weapons of mass destr
Sam Anschell
 ·  · 6m read
 · 
*Disclaimer* I am writing this post in a personal capacity; the opinions I express are my own and do not represent my employer. I think that more people and orgs (especially nonprofits) should consider negotiating the cost of sizable expenses. In my experience, there is usually nothing to lose by respectfully asking to pay less, and doing so can sometimes save thousands or tens of thousands of dollars per hour. This is because negotiating doesn’t take very much time[1], savings can persist across multiple years, and counterparties can be surprisingly generous with discounts. Here are a few examples of expenses that may be negotiable: For organizations * Software or news subscriptions * Of 35 corporate software and news providers I’ve negotiated with, 30 have been willing to provide discounts. These discounts range from 10% to 80%, with an average of around 40%. * Leases * A friend was able to negotiate a 22% reduction in the price per square foot on a corporate lease and secured a couple months of free rent. This led to >$480,000 in savings for their nonprofit. Other negotiable parameters include: * Square footage counted towards rent costs * Lease length * A tenant improvement allowance * Certain physical goods (e.g., smart TVs) * Buying in bulk can be a great lever for negotiating smaller items like covid tests, and can reduce costs by 50% or more. * Event/retreat venues (both venue price and smaller items like food and AV) * Hotel blocks * A quick email with the rates of comparable but more affordable hotel blocks can often save ~10%. * Professional service contracts with large for-profit firms (e.g., IT contracts, office internet coverage) * Insurance premiums (though I am less confident that this is negotiable) For many products and services, a nonprofit can qualify for a discount simply by providing their IRS determination letter or getting verified on platforms like TechSoup. In my experience, most vendors and companies