Hide table of contents

Cross-posted from my Substack

Summary. I state the “Ideal Reflection” principle, which says that our beliefs should match our expectations of what an ideal version of ourselves would believe. I argue that Ideal Reflection can help rationalize how Effective Altruists use the concept of expected value, and that whether we accept Ideal Reflection has important real-world implications.

Here’s an aspect of Effective Altruist discourse that has confused me for the longest time: EAs will often talk about “the” expected value of an intervention, or otherwise talk in a way that suggests that expected value is an objective feature of the world. But this clashes with the usual Bayesian story, where expected values are a matter of subjective belief.

For example, the 80,000 Hours page on expected value says (my emphasis):

Making explicit estimates of expected value is sometimes useful as a method — such as when you’re comparing two global health interventions — but it’s often better to look for useful rules of thumb and robust arguments, or even use gut intuitions and snap decisions to save time… little of our work at 80,000 Hours involves explicit or quantitative estimates of the impact of different careers. Rather, we focus on finding good proxies for expected value, such as the ‘importance, neglectedness, and tractability’ framework for comparing problems, or the concepts of leverage and career capital.

Not only does this passage talk about expected value as if it is objective, it talks about expected value estimates sometimes being “useful”. “Useful” by what standard? Does “useful” mean… “leads to greater expected value”? If so, how do we know, since it seems like we’re saying we don’t know “the” expected value? If not, what is the standard of usefulness? What is going on?

One way of making sense of this is to bring in an ideal version of ourselves. “The” expected value is the expected value that would be assigned by an agent that has the same evidence as us, but is a vastly more powerful reasoner. Something like a perfect Bayesian who can reason over a ~maximally granular and exhaustive set of hypotheses, and has seen everything we’ve seen. So, “the” expected values are still subjective (they’re the ideal agent’s subjective expectations), in line with standard Bayesianism. At the same time, they’re a kind of objective fact, since we are imagining that there is some objective fact of the matter about what an ideal version of ourselves would believe.

That doesn’t get us far. How can the ideal agent’s expectations, unknown to us, be action-guiding? You could say we should try to “approximate” the ideal. But what notion of “closeness” to the ideal are we talking about? Why is that notion of closeness normatively relevant? And in what sense are we justified in believing that any particular strategy actually gets us “closer” to the ideal agent? (See also Violet Hour on “approximating” the ideal.)

Here’s what I think one should say. We can at least form beliefs about what the ideal agent would think. We can say, for example, “It really seems to me like this intervention is great, but when I think about all the different considerations the ideal agent will have thought of, I’m pretty unsure what they would think.” Then we can require that our expectations should match our best guesses about the ideal agent’s expectations.

We don’t want to assume that people can always come up with numerical expected values, though. So we will just assume that people can make comparative expected value judgements, like “I expect this intervention to be better than that one”. This is in line with how EAs talk about expected value, acknowledging we often can’t come up with all-things-considered numerical values but still making comparative judgements like “The expected sign of so-and-so is positive” (example).[1]

Our principle is then:

Ideal Reflection. Our expected values should match our expected values of the ideal agent’s expected values. More precisely: Let X and Y be unknown quantities (like total welfare under two different interventions), and let E*[X] and E*[Y] be the ideal agent’s (unknown) expectations for these quantities. Then, our expectation for X is greater than our expectation for Y if and only if our expectation for E*[X] is greater than our expectation E*[Y].[2]

In epistemology, “reflection” principles say that we should defer, in some sense, to other versions of ourselves. The original reflection principle says that, if we know our future self would have some belief, we should adopt it too. The “Awareness Reflection” principle says that we should defer to a version of ourselves that has conceived (is “aware”) of more possibilities than we have (Steele and Stefánsson 2021, chap. 7). Ideal Reflection takes this to its natural conclusion: While we’re deferring to an epistemically superior version of ourselves, how about deferring to the most epistemically superior version?

I’m not the first to consider deferring to some kind of epistemic ideal, of course. After a quick look through the philosophy literature I couldn’t find the exact same formulation, but Christensen’s (2010) Rational Reflection and Spohn’s (2023) Full Reflection are closely related. And in their excellent “Effective Altruism’s Implicit Epistemology”, Violet Hour discusses a very similar potential rationalization of EA practice.

Returning to the exegesis of the 80K quote above: We have to somehow form beliefs about what the ideal agent would believe. Sometimes forming these beliefs can involve doing explicit expected value calculations in simple models (“expected value estimates”), and sometimes it can involve thinking about the kinds of “proxies” the 80K quote mentions. When 80K says that explicit expected value estimates can be “useful”, they are saying that we can sometimes take them as informative to our all-things-considered beliefs about the ideal agent’s expectations.

It’s plausible to me that we should satisfy Ideal Reflection. Even if people aren’t thinking in terms of Ideal Reflection, I think that they probably should be! But so what? We’ve got a clearer way of understanding EA talk about “the” expected values of different interventions. Who cares? What does it matter to what we do in practice?

I think it might matter a lot. Here’s a quick survey of questions Ideal Reflection raises, some of which I’ll likely flesh out in later posts:

  • It’s not so easy to make sense of what “the ideal” agent could even mean! Among other things, we would need to specify “the ideal” through some sort of reflection process (e.g., for reflecting on normative principles for choosing priors, updating on self-locating evidence, etc). And, as Joe Carlsmith argues, it is likely that there is not a unique privileged reflection process.
    • If the most coherent version of EA epistemology and decision theory is centrally about forming beliefs about an ideal agent, but this concept doesn’t make any sense, then something has to give.
    • And even if indeterminacy in “the ideal” is acceptable it may have important implications, such as making our own expectations more indeterminate, pushing towards cluelessness.
  • If we reject Ideal Reflection, what does that imply? What do Effective Altruists mean by expected value-talk? Is it coherent? And aren’t we going to be able to imagine some hypothetical agents to whom we want to defer, even if not the “ideal”? Which ones are they, exactly, and what follows as to what we should believe?
  • Ideal Reflection gives us a clear practical recommendation: When forming beliefs, don’t just ask yourself “What do I think about this?”, ask yourself, “What do I think the ideal agent, who reasons over vastly more hypotheses than we, may think in entirely different ontologies, etc, would believe about this?” If people did this consistently I think they might end up way more uncertain about the net effects of interventions.
    • In fact I think it’s a good argument that we should have highly indeterminate beliefs and therefore be clueless, for basically the reasons given here.
  • If Ideal Reflection is right, it should probably inform how we design AI systems to reason about philosophically thorny topics, which could be relevant to making sure they don’t go off the rails in various ways. For example, Ideal Reflection is closely related to Anthony’s discussion of “(really) open-minded updatelessness” here.

References

Christensen, David. 2010. “Rational Reflection.” Philosophical Perspectives. A Supplement to Nous 24 (1): 121–140.

Spohn, Wolfgang. 2023. “A Generalization of the Reflection Principle.” Dialectica 77 (1): 73–96.

Steele, K., and H. O. Stefánsson. 2021. Beyond Uncertainty: Reasoning with Unknown Possibilities.

  1. ^

    Still, it’s a bit mysterious what these comparative judgements are. What makes them comparative expectations (i.e., an appropriate adaptation of the usual notion of “expected value”), rather than comparative medians or modes or whatever? For starters, it seems like comparative expectations should result from some kind of “weighing up” of the plausibility times magnitude of different outcomes, even if crude. But I don’t have a full account to give.

  2. ^

    This looks kind of like the Law of Total Expectation, according to which a perfectly Bayesian agent’s expectations will equal the expected value of their expectations given some future evidence, E[X] = E[E[X | Future Evidence]]. But it isn’t the same thing! Here, we are uncertain about what we would believe if we were to become arbitrarily good reasoners, holding our evidence fixed.

26

1
0

Reactions

1
0

More posts like this

Comments
No comments on this post yet.
Be the first to respond.
Curated and popular this week
Relevant opportunities