Senior research analyst at Open Philanthropy. Doctoral student in philosophy at the University of Oxford. Opinions my own.



Topic Contributions


On infinite ethics

A few questions about this: 

  1. Does this view imply that it is actually not possible to have a world where e.g. a machine creates one immortal happy person per day, forever, who then form an ever-growing line?
  2. How does this view interpret cosmological hypotheses on which the universe is infinite? Is the claim that actually, on those hypotheses, the universe is finite after all? 
  3. It seems like lots of the (countable) worlds and cases discussed in the post can simply be reframed as never-ending processes, no? And then similar (identical?) questions will arise? Thus, for example, w5 is equivalent to a machine that creates a1 at -1, then a3 at -1, then a5 at -1, etc. w6 is equivalent to a machine that creates a1 at -1, then a2 at -1, a3 at -1, etc. What would this view say about which of these machines we should create, given the opportunity? How should we compare these to a w8 machine that creates b1 at -1, b2 at -1, b3 at -1, b4 at -1, etc?

Re: the Jaynes quote: I'm not sure I've understood the full picture here, but in general, to me it doesn't feel like the central issues here have to do with dependencies on "how the limit is approached," such that requiring that each scenario pin down an "order" solves the problems. For example, I think that a lot of what seems strange about Neutrality-violations in these cases is that even if we pin down an order for each case, the fact that you can re-arrange one into the other makes it seem like they ought to be ethically equivalent. Maybe we deny that, and maybe we do so for reasons related to what you're talking about - but it seems like the same bullet. 

Listen to more EA content with The Nonlinear Library

Thanks for doing this! I've found it useful, and I expect that it will increase my engagement with EA Forum/LW content going forward.

SIA > SSA, part 3: An aside on betting in anthropics

"that just indicates that EDT-type reasoning is built into the plausibility of SIA"

 If by this you mean "SIA is only plausible if you accept EDT," then I disagree. I think many of the arguments for SIA -- for example, "you should 1/4 on each of tails-mon, tails-tues, heads-mon, and heads-tues in Sleeping Beauty with two wakings each, and then update to being a thirder if you learn you're not in heads-tues," "telekinesis doesn't work," "you should be one-half on not-yet-flipped fair coins," "reference classes aren't a thing," etc -- don't depend on EDT, or even on EDT-ish intuitions. 

you talk about contorting one's epistemology in order to bet a particular way, but what's the alternative? If I'm an EDT agent who wants to bet at odds of a third, what is the principled reasoning that leads me to have credence of a half?

The alternative is to just bet the way you want to anyway, in the same way that the (most attractive, imo) alternative to two-boxing in transparent newcomb is not "believe that the boxes are opaque" but "one-box even though you know they're transparent." You don't need to have a credence of a half to bet how you want to -- especially if you're updateless. And note that EDT-ish SSA-ers have the fifthing problem too, in cases like the "wake up twice regardless, then learn that you're not heads-tuesday" version I just mentioned (where SSA ends up at 1/3rd on heads, too).

You argue that questions like "could I have been a chimpanzee" seem ridiculous. But these are closely analogous to the types of questions that one needs to ask when making decisions according to FDT (e.g. "are the decisions of chimpanzees correlated with my own?") So, if we need to grapple with these questions somehow in order to make decisions, grappling with them via our choice of a reference class doesn't seem like the worst way to do so.

I think that "how much are my decisions correlated with those of the chimps?" is a much more meaningful and tractable question, with a much more determinate answer, than "are the chimps in my reference class?" Asking questions about correlations between things is the bread and butter of Bayesianism. Asking questions anthropic reference classes isn't -- or, doesn't need to be. 

I'm reminded of Yudkowsky's writing about why he isn't prepared to get rid of the concept of "anticipated subjective experience", despite the difficulties it poses from a quantum-mechanical perspective.

Thanks for the link. I haven't read this piece, but fwiw, to me it feels like "there is a truth about the way that the world is/about what world I'm living in, I'm trying to figure out what that truth is" is something we shouldn't give up lightly. I haven't engaged much with the QM stuff here, and I can imagine it moving me, but "how are you going to avoid fifth-ing?" doesn't seem like a strong enough push on its own.

SIA > SSA, part 1: Learning from the fact that you exist

It’s a good question, and one I considered going into in more detail on in the post (I'll add a link to this comment). I think it’s helpful to have in mind two types of people: “people who see the exact same evidence you do” (e.g., they look down on the same patterns of wrinkles on your hands, the same exact fading on the jeans they’re wearing, etc) and “people who might, for all you know about a given objective world, see the exact same evidence you do” (an example here would be “the person in room 2”). By “people in your epistemic situation,” I mean the former. The latter I think of as actually a disguised set of objective worlds, which posit different locations (and numbers) of the former-type people. But SIA, importantly, likes them both (though on my gloss, liking the former is more fundamental).

Here are some cases to illustrate. Suppose that God creates either one person in room 1 (if heads) or two people (if tails) in rooms 1 and 2. And suppose that there are two types of people: “Alices” and “Bobs.” Let’s say that any given Alice sees the exact same evidence as the other Alices (the same wrinkles, faded jeans, etc), and that the same holds for Bobs, and that if you’re an Alice or a Bob, you know it. Now consider three cases: 

  1. For each person God creates, he flips a second coin. If it’s heads, he creates an Alice. If tails, a Bob. 
  2. God flips a second coin. If it’s heads, he makes the person in room 1 Alice; if tails, Bob. But if the first coin was tails and he needs to create a second person, he makes that person different from the first. Thus, if tails-heads, it’s an Alice in room 1, and a Bob in room 2. But if it’s tails-tails, then it’s a Bob in room 1, and an Alice in room 2. (I talk about this case in part 4, XV.)
  3. God creates all Alices no matter what. 

Let’s write people’s names with “A” or “B,” in order of room number. And let’s say you wake up as an Alice. 

  • In case one, “coin 1 heads” (I’ll write the coin-1 results in parentheses) corresponds to two objective worlds — A, and B — each with 1/4 prior probability. Coin 1 tails corresponds to four objective worlds — AA, AB, BA, and BB — each with 1/8th prior probability. So as Alice, you start by crossing off B and BB, because there are no Alices. So you’re left with 1/4 on A, and 1/8th on each of AA, AB, and BA, so an overall odds-ratio of 2:1:1:1. But now, as SIA, you scale the prior in proportion to the number of Alices there are, so AA gets double weight. Now you’re 2:2:1:1. Thus, you end up with 1/3rd on A, 1/3 on AA (with 1/6th on each of the corresponding centered worlds), and 1/6th on each of AB and BA. And you’re a “thirder" overall. 
  • Now let’s look at case two. Here, the prior is 1/4 on A, 1/4 on B, 1/4 on AB, and 1/4 on BA. So SIA doesn’t actually do any scaling of the prior: there’s a maximum of one A in each world. Rather, it crosses off B, and ends up with 1/3rd on anything else, and stays a “thirder” overall. 
  • Case three is just Sleeping Beauty: SIA scales in proportion to the number of Alices, and ends up a thirder overall. 

So in each of these cases, SIA gives the same result, even though the distribution of Alices is in some sense pretty different. And notice, we can redescribe case 1 and 2 in terms of SIA liking “people who, for all you know about a given objective world, might be an Alice” instead of in terms of SIA liking Alices. E.g., in both cases, there are twice as many such people on tails. But importantly, their probability of being an Alice isn’t correlated with coin 1 heads vs. coin 1 tails. 

Anthropics cases are sometimes ambiguous about whether they’re talking about cases of type 1 or of type 3. God’s coin toss is closer to case 1: e.g., you wake up as a person in a room, but we didn’t specify that God was literally making exact copies of you in the other rooms -- your reasoning, though, treats his probability of giving any particular objective-world person your exact evidence is constant across people. Sleeping Beauty is often treated as more like case 3, but it’s compatible with being more of a case 1 type (e.g., if the experimenters also flip another coin on each waking, and leave it for Beauty to see, this doesn’t make a difference; and in general, the Beauties could have different subjective experiences on each waking, as long as —as far as Beauty knows — these variations in experience are independent of the coin toss outcome). I'm not super careful about these distinctions in the post, partly because actually splitting out all of the possible objective worlds in type-1 cases isn't really  do-able (there's no well-defined distribution that God is "choosing from" when he creates each person in God's coin toss --but his choice is treated, from your perspective, as independent from the coin toss outcome); and as noted, SIA's verdicts end up the same.

Can you control the past?

Cool, this gives me a clearer picture of where you're coming from. I had meant the central question of the post to be whether it ever makes sense to do the EDT-ish try-to-control-the-past thing, even in pretty unrealistic cases -- partly because I think answering "yes" to this is weird and disorienting in itself, even if it doesn't end up making much of a practical difference day-to-day; and partly because a central objection to EDT is that the past, being already fixed, is never controllable in any practically-relevant sense, even in e.g. Newcomb's cases. It sounds like your main claim is that in our actual everyday circumstances, with respect to things like the WWI case, EDTish and CDT recommendations don't come apart -- a topic I don't spend much time on or have especially strong views about.

"you’re going to lean on the difference between 'cause' and 'control'" -- indeed, and I had meant the "no causal interaction with" part of opening sentence to indicate this. It does seem like various readers object to/were confused by the use of the term "control" here, and I think there's room for more emphasis early on as to what specifically I have in mind; but at a high-level, I'm inclined to keep the term "control," rather than trying to rephrase things solely in terms of e.g. correlations, because I think it makes sense to think of yourself as, for practical purposes, "controlling" what your copy writes on his whiteboard, what Omega puts in the boxes, etc; that more broadly, EDT-ish decision-making is in fact weird in the way that trying to control the past is weird, and that this makes it all the more striking and worth highlighting that EDT-ish decision-making seems, sometimes, like the right way to go. 

Can you control the past?

Not sure exactly what words people have used, but something like this idea is pretty common in the non-CDT literature, and I think e.g. MIRI explicitly talks about "controlling" things like your algorithm.

Can you control the past?

I think this is an interesting objection. E.g., "if you're into EDT ex ante, shouldn't you be into EDT ex post, and say that it was a 'good action' to learn about the Egyptians, because you learned that they were better off than you thought in expectation?" I think it depends, though, on how you are doing the ex post evaluation: and the objection doesn't work if the ex post evaluation conditions on the information you learn. 

That is, suppose that before you read Wikipedia, you were 50% on the Egyptians were at 0 welfare, and 50% they were at 10 welfare, so 5 in expectation, but reading is 0 EV. After reading, you find out that their welfare was 10. OK, should we count this action, in retrospect, as worth 5 welfare for the Egyptians? I'd say no, because the ex post evaluation should go: "Granted that the Egyptians were at 10 welfare, was it good to learn that they were at 10 welfare?". And the answer is no: the learning was a 0-welfare change.

Can you control the past?

"the emphasis here seems to be much more about whether you can actually have a causal impact on the past" -- I definitely didn't mean to imply that you could have a causal impact on the past. The key point is that the type of control in question is acausal. 

I agree that many of these cases involve unrealistic assumptions, and that CDT may well be an effective heuristic most of the time (indeed, I expect that it is). 

I don't feel especially hung up on calling it "control" -- ultimately it's the decision theory (e.g., rejecting CDT) that I'm interested in. I like the word "control," though, because I think there is a very real sense in which you get to choose what your copy writes on his whiteboard, and that this is pretty weird; and because, more broadly, one of the main objections to non-CDT decision theories is that it feels like they are trying to "control" the past in some sense (and I'm saying: this is OK).

Simulation stuff does seem like it could be one in principle application here, e.g.: "if we create civilizations simulations, then this makes it more likely that others whose actions are correlated with ours create simulations, in which case we're more likely to be in a simulation, so because we don't want to be in a simulation, this is a reason to not create simulations." But it seems there are various empirical assumptions about the correlations at stake here, and I haven't thought about cases like this much (and simulation stuff gets gnarly fast, even without bringing weird decision-theory in).

Can you control the past?

Thanks for these comments. 

Re: “physics-based priors,” I don't think I have a full sense of what you have in mind, but at a high level, I don’t yet see how physics comes into the debate. That is, AFAICT everyone agrees about the relevant physics — and in particular, that you can’t causally influence the past, “change” the past, and so on. The question as I see it (and perhaps I should’ve emphasized this more in the post, and/or put things less provocatively) is more conceptual/normative: whether when making decisions we should think of the past the way CDT does — e.g., as a set of variables whose probabilities our decision-making can’t alter — or in the way that e.g. EDT does — e.g., as a set of variables whose probabilities our decision-making can alter (and thus, a set of variables that EDT-ish decision-making implicitly tries to “control” in a non-causal sense). Non-causal decision theories are weird; but they aren’t actually “I don’t believe in normal physics” weird. They’re more “I believe in managing the news about the already-fixed past” weird. 

Re: CDT’s domain of applicability, it sounds like your view is something like: “CDT generally works, but it fails in the type of cases that Joe treats as counter-examples to CDT.” I agree with this, and I think most people who reject CDT would agree, too (after all, most decision theories agree on what to do in most everyday cases; the traditional questions have been about what direction to go when their verdicts come apart). I’m inclined to think of this as CDT being wrong, because I’m inclined to think of decision theory as searching for the theory that will get the full range of cases right — but I’m not sure that much hinges on this. That said, I do think that even acknowledging that CDT fails sometimes involves rejecting some principles/arguments one might’ve thought would hold good in general (e.g. “c’mon, man, it’s no use trying to control the past,”the "what would your friend who can see what's in the boxes say is better" argument, and so on) and thereby saying some striking and weird stuff (e.g. “Ok, it makes sense to try to control the past sometimes, just not that often"). 

Re: 1-4, I agree that whether or not CDT leads you astray in a given case is an empirical question. I don’t have strong views about what range of actual cases are like this — though I’m sympathetic to your view re: 1, and as I mention in the post, I generally think we should just err on the side of not doing stuff that looks silly by normal lights. I also don’t have strong views about the relevance of non-causal decision-theory research for AGI safety (this project mostly emerged from personal interest).

Load More