Joe_Carlsmith

1695Joined Nov 2016

Bio

Senior research analyst at Open Philanthropy. Doctoral student in philosophy at the University of Oxford. Opinions my own.

Sequences
1

SIA > SSA

Comments
31

Noting that the passage you quote in your appendix from my report isn't my definition of the type of AI I'm focused on. I'm focused on AI systems with the following properties:

  • Advanced capability: they outperform the best humans on some set of tasks which when performed at advanced levels grant significant power in today’s world (tasks like scientific research, business/military/political strategy, engineering, and persuasion/manipulation). 
  • Agentic planning: they make and execute plans, in pursuit of objectives, on the basis of models of the world. 
  • Strategic awareness: the models they use in making plans represent with reasonable accuracy the causal upshot of gaining and maintaining power over humans and the real-world environment.

See section 2.1 in the report for more in-depth description.

Hi Jake, 

Thanks for this comment. I discuss this sort of case in footnote 33 here -- I think it's a good place to push back on the argument. Quoting what I say there:

there is, perhaps, some temptation to say “even if I should be indifferent to these people burned alive, I’m not! Screw indifference-ism world! Sounds like a shitty objective normative order anyway – let’s rebel against it.” That is, it feels like indifference-ism worlds have told me what the normative facts are, but they haven’t told me about my loyalty to the normative facts, and the shittyness of these normative facts puts that loyalty even more in question.  

And perhaps, as well, there’s some temptation to think that “Well, indifference-ism world is morally required to be indifferent to my overall decision-procedure as well – so I’ll use a decision-procedure that isn’t indifferent to what happens in indifference-ism world. Indifference-ism world isn't allowed to care!”  

These responses might seem dicey, though. If they (or others) don't end up working, ultimately I think that biting the bullet and taking this sort of deal is in fact less bad than doing so in the nihilism-focused version or the original. So it’s an option if necessary – and one I’d substantially prefer to biting the bullet in all of them.

That is, I'm interested in some combination of: 

  • Not taking the deal because you're uncertain of your loyalty to the normative facts (e.g., something about internalism/externalism etc)
  • Not taking the deal because indifference-ism world is indifferent to your decision procedure (or to your actions more generally), so whatever, let's save my family in those worlds. 
  • Biting the bullet and taking the deal if it comes to that, but not taking it in the other cases discussed in the post. 

Adding a few more thoughts, I think part of what I'm interested in here is the question of what you would be "trying" to do (from some kind of "I endorse this" perspective, even if the endorsement doesn't have any external backing from the normative facts) conditional on a given world. If, in indifference-ism world, you wouldn't be trying, in this sense, to protect your family, such that your representative from indifference-ism world would indeed be like "yeah, go ahead, burn my family alive," then taking the deal looks more OK to me. But if, conditional on indifference-ism, you would be trying to protect your family anyway (maybe because: the normative facts are indifferent, so might as well), such that your representative from indifference-ism world would be like "I'm against this deal," then taking the deal looks worse to me. And the second thing seems more like where I'd expect to end up.

A few questions about this: 

  1. Does this view imply that it is actually not possible to have a world where e.g. a machine creates one immortal happy person per day, forever, who then form an ever-growing line?
  2. How does this view interpret cosmological hypotheses on which the universe is infinite? Is the claim that actually, on those hypotheses, the universe is finite after all? 
  3. It seems like lots of the (countable) worlds and cases discussed in the post can simply be reframed as never-ending processes, no? And then similar (identical?) questions will arise? Thus, for example, w5 is equivalent to a machine that creates a1 at -1, then a3 at -1, then a5 at -1, etc. w6 is equivalent to a machine that creates a1 at -1, then a2 at -1, a3 at -1, etc. What would this view say about which of these machines we should create, given the opportunity? How should we compare these to a w8 machine that creates b1 at -1, b2 at -1, b3 at -1, b4 at -1, etc?

Re: the Jaynes quote: I'm not sure I've understood the full picture here, but in general, to me it doesn't feel like the central issues here have to do with dependencies on "how the limit is approached," such that requiring that each scenario pin down an "order" solves the problems. For example, I think that a lot of what seems strange about Neutrality-violations in these cases is that even if we pin down an order for each case, the fact that you can re-arrange one into the other makes it seem like they ought to be ethically equivalent. Maybe we deny that, and maybe we do so for reasons related to what you're talking about - but it seems like the same bullet. 

Thanks for doing this! I've found it useful, and I expect that it will increase my engagement with EA Forum/LW content going forward.

"that just indicates that EDT-type reasoning is built into the plausibility of SIA"

 If by this you mean "SIA is only plausible if you accept EDT," then I disagree. I think many of the arguments for SIA -- for example, "you should 1/4 on each of tails-mon, tails-tues, heads-mon, and heads-tues in Sleeping Beauty with two wakings each, and then update to being a thirder if you learn you're not in heads-tues," "telekinesis doesn't work," "you should be one-half on not-yet-flipped fair coins," "reference classes aren't a thing," etc -- don't depend on EDT, or even on EDT-ish intuitions. 

you talk about contorting one's epistemology in order to bet a particular way, but what's the alternative? If I'm an EDT agent who wants to bet at odds of a third, what is the principled reasoning that leads me to have credence of a half?

The alternative is to just bet the way you want to anyway, in the same way that the (most attractive, imo) alternative to two-boxing in transparent newcomb is not "believe that the boxes are opaque" but "one-box even though you know they're transparent." You don't need to have a credence of a half to bet how you want to -- especially if you're updateless. And note that EDT-ish SSA-ers have the fifthing problem too, in cases like the "wake up twice regardless, then learn that you're not heads-tuesday" version I just mentioned (where SSA ends up at 1/3rd on heads, too).

You argue that questions like "could I have been a chimpanzee" seem ridiculous. But these are closely analogous to the types of questions that one needs to ask when making decisions according to FDT (e.g. "are the decisions of chimpanzees correlated with my own?") So, if we need to grapple with these questions somehow in order to make decisions, grappling with them via our choice of a reference class doesn't seem like the worst way to do so.

I think that "how much are my decisions correlated with those of the chimps?" is a much more meaningful and tractable question, with a much more determinate answer, than "are the chimps in my reference class?" Asking questions about correlations between things is the bread and butter of Bayesianism. Asking questions anthropic reference classes isn't -- or, doesn't need to be. 

I'm reminded of Yudkowsky's writing about why he isn't prepared to get rid of the concept of "anticipated subjective experience", despite the difficulties it poses from a quantum-mechanical perspective.

Thanks for the link. I haven't read this piece, but fwiw, to me it feels like "there is a truth about the way that the world is/about what world I'm living in, I'm trying to figure out what that truth is" is something we shouldn't give up lightly. I haven't engaged much with the QM stuff here, and I can imagine it moving me, but "how are you going to avoid fifth-ing?" doesn't seem like a strong enough push on its own.

It’s a good question, and one I considered going into in more detail on in the post (I'll add a link to this comment). I think it’s helpful to have in mind two types of people: “people who see the exact same evidence you do” (e.g., they look down on the same patterns of wrinkles on your hands, the same exact fading on the jeans they’re wearing, etc) and “people who might, for all you know about a given objective world, see the exact same evidence you do” (an example here would be “the person in room 2”). By “people in your epistemic situation,” I mean the former. The latter I think of as actually a disguised set of objective worlds, which posit different locations (and numbers) of the former-type people. But SIA, importantly, likes them both (though on my gloss, liking the former is more fundamental).

Here are some cases to illustrate. Suppose that God creates either one person in room 1 (if heads) or two people (if tails) in rooms 1 and 2. And suppose that there are two types of people: “Alices” and “Bobs.” Let’s say that any given Alice sees the exact same evidence as the other Alices (the same wrinkles, faded jeans, etc), and that the same holds for Bobs, and that if you’re an Alice or a Bob, you know it. Now consider three cases: 

  1. For each person God creates, he flips a second coin. If it’s heads, he creates an Alice. If tails, a Bob. 
  2. God flips a second coin. If it’s heads, he makes the person in room 1 Alice; if tails, Bob. But if the first coin was tails and he needs to create a second person, he makes that person different from the first. Thus, if tails-heads, it’s an Alice in room 1, and a Bob in room 2. But if it’s tails-tails, then it’s a Bob in room 1, and an Alice in room 2. (I talk about this case in part 4, XV.)
  3. God creates all Alices no matter what. 

Let’s write people’s names with “A” or “B,” in order of room number. And let’s say you wake up as an Alice. 

  • In case one, “coin 1 heads” (I’ll write the coin-1 results in parentheses) corresponds to two objective worlds — A, and B — each with 1/4 prior probability. Coin 1 tails corresponds to four objective worlds — AA, AB, BA, and BB — each with 1/8th prior probability. So as Alice, you start by crossing off B and BB, because there are no Alices. So you’re left with 1/4 on A, and 1/8th on each of AA, AB, and BA, so an overall odds-ratio of 2:1:1:1. But now, as SIA, you scale the prior in proportion to the number of Alices there are, so AA gets double weight. Now you’re 2:2:1:1. Thus, you end up with 1/3rd on A, 1/3 on AA (with 1/6th on each of the corresponding centered worlds), and 1/6th on each of AB and BA. And you’re a “thirder" overall. 
  • Now let’s look at case two. Here, the prior is 1/4 on A, 1/4 on B, 1/4 on AB, and 1/4 on BA. So SIA doesn’t actually do any scaling of the prior: there’s a maximum of one A in each world. Rather, it crosses off B, and ends up with 1/3rd on anything else, and stays a “thirder” overall. 
  • Case three is just Sleeping Beauty: SIA scales in proportion to the number of Alices, and ends up a thirder overall. 

So in each of these cases, SIA gives the same result, even though the distribution of Alices is in some sense pretty different. And notice, we can redescribe case 1 and 2 in terms of SIA liking “people who, for all you know about a given objective world, might be an Alice” instead of in terms of SIA liking Alices. E.g., in both cases, there are twice as many such people on tails. But importantly, their probability of being an Alice isn’t correlated with coin 1 heads vs. coin 1 tails. 

Anthropics cases are sometimes ambiguous about whether they’re talking about cases of type 1 or of type 3. God’s coin toss is closer to case 1: e.g., you wake up as a person in a room, but we didn’t specify that God was literally making exact copies of you in the other rooms -- your reasoning, though, treats his probability of giving any particular objective-world person your exact evidence is constant across people. Sleeping Beauty is often treated as more like case 3, but it’s compatible with being more of a case 1 type (e.g., if the experimenters also flip another coin on each waking, and leave it for Beauty to see, this doesn’t make a difference; and in general, the Beauties could have different subjective experiences on each waking, as long as —as far as Beauty knows — these variations in experience are independent of the coin toss outcome). I'm not super careful about these distinctions in the post, partly because actually splitting out all of the possible objective worlds in type-1 cases isn't really  do-able (there's no well-defined distribution that God is "choosing from" when he creates each person in God's coin toss --but his choice is treated, from your perspective, as independent from the coin toss outcome); and as noted, SIA's verdicts end up the same.

Cool, this gives me a clearer picture of where you're coming from. I had meant the central question of the post to be whether it ever makes sense to do the EDT-ish try-to-control-the-past thing, even in pretty unrealistic cases -- partly because I think answering "yes" to this is weird and disorienting in itself, even if it doesn't end up making much of a practical difference day-to-day; and partly because a central objection to EDT is that the past, being already fixed, is never controllable in any practically-relevant sense, even in e.g. Newcomb's cases. It sounds like your main claim is that in our actual everyday circumstances, with respect to things like the WWI case, EDTish and CDT recommendations don't come apart -- a topic I don't spend much time on or have especially strong views about.

"you’re going to lean on the difference between 'cause' and 'control'" -- indeed, and I had meant the "no causal interaction with" part of opening sentence to indicate this. It does seem like various readers object to/were confused by the use of the term "control" here, and I think there's room for more emphasis early on as to what specifically I have in mind; but at a high-level, I'm inclined to keep the term "control," rather than trying to rephrase things solely in terms of e.g. correlations, because I think it makes sense to think of yourself as, for practical purposes, "controlling" what your copy writes on his whiteboard, what Omega puts in the boxes, etc; that more broadly, EDT-ish decision-making is in fact weird in the way that trying to control the past is weird, and that this makes it all the more striking and worth highlighting that EDT-ish decision-making seems, sometimes, like the right way to go. 

Not sure exactly what words people have used, but something like this idea is pretty common in the non-CDT literature, and I think e.g. MIRI explicitly talks about "controlling" things like your algorithm.

Load More