Bleh, sorry, looks like this cross-posted twice from LW, but has only one LW copy (and can only be edited via that copy). I'm going to try to keep this version and get rid of the other one. Edit: resolved.
(Copying over my response from LessWrong)
Thanks for writing this -- I’m very excited about people pushing back on/digging deeper re: counting arguments, simplicity arguments, and the other arguments re: scheming I discuss in the report. Indeed, despite the general emphasis I place on empirical work as the most promising source of evidence re: scheming, I also think that there’s a ton more to do to clarify and maybe debunk the more theoretical arguments people offer re: scheming – and I think playing out the dialectic further in this respect ...
(Oops, sorry, something got messed up with the LessWrong cross-posting here, such that the old version of this post, including comments and upvotes, disappeared. Working on trying to fix.)
(Also copied from LW. And partly re-hashing my response from twitter.)
I'm seeing your main argument here as a version of what I call, in section 4.4, a "speed argument against schemers" -- e.g., basically, that SGD will punish the extra reasoning that schemers need to perform.
(I’m generally happy to talk about this reasoning as a complexity penalty, and/or about the params it requires, and/or about circuit-depth -- what matters is the overall "preference" that SGD ends up with. And thinking of this consideration as a different kind of counting argume...
Thanks for this thoughtful comment, Ben. And also, for putting the "The Gold Lily" and "Mother and Child" on my radar -- they hadn't been before. I agree that "Mother and Child" evokes a sort some kind of sort of intergenerational project in the way you describe -- "it is your turn to address it." It seems related to the thing I was trying to talk about at the end of the post -- e.g., Gluck asking for some kind of directness and intensity of engagement with life.
Thanks! Re: one in five million and .01% -- thanks, edited. And thanks for pointing to the Augenblick piece -- does look relevant (though my specific interest in that footnote was in constraints applicable to a model where you can only consider some subset of your evidence at any given time).
I'm sorry to hear about this, Nathan. As I say in the post, I do think that the question how to do gut-stuff right from a practical perspective is distinct from the epistemic angle that the post focuses on, and I think it's important to attend to both.
Noting that the passage you quote in your appendix from my report isn't my definition of the type of AI I'm focused on. I'm focused on AI systems with the following properties:
Hi Jake,
Thanks for this comment. I discuss this sort of case in footnote 33 here -- I think it's a good place to push back on the argument. Quoting what I say there:
...there is, perhaps, some temptation to say “even if I should be indifferent to these people burned alive, I’m not! Screw indifference-ism world! Sounds like a shitty objective normative order anyway – let’s rebel against it.” That is, it feels like indifference-ism worlds have told me what the normative facts are, but they haven’t told me about my loyalty to the normati
A few questions about this:
Thanks for doing this! I've found it useful, and I expect that it will increase my engagement with EA Forum/LW content going forward.
"that just indicates that EDT-type reasoning is built into the plausibility of SIA"
If by this you mean "SIA is only plausible if you accept EDT," then I disagree. I think many of the arguments for SIA -- for example, "you should 1/4 on each of tails-mon, tails-tues, heads-mon, and heads-tues in Sleeping Beauty with two wakings each, and then update to being a thirder if you learn you're not in heads-tues," "telekinesis doesn't work," "you should be one-half on not-yet-flipped fair coins," "reference classes aren't a thing," etc -- don't depend on EDT...
It’s a good question, and one I considered going into in more detail on in the post (I'll add a link to this comment). I think it’s helpful to have in mind two types of people: “people who see the exact same evidence you do” (e.g., they look down on the same patterns of wrinkles on your hands, the same exact fading on the jeans they’re wearing, etc) and “people who might, for all you know about a given objective world, see the exact same evidence you do” (an example here would be “the person in room 2”). By “people in your epistemic situation,” I mean the ...
Cool, this gives me a clearer picture of where you're coming from. I had meant the central question of the post to be whether it ever makes sense to do the EDT-ish try-to-control-the-past thing, even in pretty unrealistic cases -- partly because I think answering "yes" to this is weird and disorienting in itself, even if it doesn't end up making much of a practical difference day-to-day; and partly because a central objection to EDT is that the past, being already fixed, is never controllable in any practically-relevant sense, even in e.g. Newcomb's cases....
Not sure exactly what words people have used, but something like this idea is pretty common in the non-CDT literature, and I think e.g. MIRI explicitly talks about "controlling" things like your algorithm.
I think this is an interesting objection. E.g., "if you're into EDT ex ante, shouldn't you be into EDT ex post, and say that it was a 'good action' to learn about the Egyptians, because you learned that they were better off than you thought in expectation?" I think it depends, though, on how you are doing the ex post evaluation: and the objection doesn't work if the ex post evaluation conditions on the information you learn.
That is, suppose that before you read Wikipedia, you were 50% on the Egyptians were at 0 welfare, and 50% they were at 10 welfar...
"the emphasis here seems to be much more about whether you can actually have a causal impact on the past" -- I definitely didn't mean to imply that you could have a causal impact on the past. The key point is that the type of control in question is acausal.
I agree that many of these cases involve unrealistic assumptions, and that CDT may well be an effective heuristic most of the time (indeed, I expect that it is).
I don't feel especially hung up on calling it "control" -- ultimately it's the decision theory (e.g., rejecting CDT) that I'm intere...
Thanks for these comments.
Re: “physics-based priors,” I don't think I have a full sense of what you have in mind, but at a high level, I don’t yet see how physics comes into the debate. That is, AFAICT everyone agrees about the relevant physics — and in particular, that you can’t causally influence the past, “change” the past, and so on. The question as I see it (and perhaps I should’ve emphasized this more in the post, and/or put things less provocatively) is more conceptual/normative: whether when making decisions we should think of the past the wa...
I'm imagining computers with sufficiently robust hardware to function deterministically at the software level, in the sense of very reliably performing the same computation, even if there's quantum randomness at a lower level. Imagine two good-quality calculators, manufactured by the same factory using the same process, which add together the same two numbers using the same algorithm, and hence very reliably move through the same high-level memory states and output the same answer. If quantum randomness makes them output different answers, I count that as a "malfunction."
I have sympathy for responses like "look, it's just so clear that you can't control the past in any practically relevant sense that we should basically just assume the type of arguments in this post are wrong somehow." But I'm curious where you think the arguments actually go wrong, if you have a view about that? For example, do you think defecting in perfect deterministic twin prisoner's dilemmas with identical inputs is the way to go?
So certainly physics-based priors is a big component, and indeed in some sense is all of it. That is, I think physics-based priors should give you an immediate answer of "you can't influence the past with high probability", and moreover that once you think through the problems in detail the conclusion will be that you could influence the past if physics were different (including boundary conditions, even if laws remain the same), but still that boundary condition priors should still tell us you can't influence the past. I'm happy to elaborate.
F...
Thanks, Richard :). Re: arbitrariness, in a sense the relevant choices might well end up arbitrary (and as you say, subjectivists need to get used to some level of unavoidable arbitrariness), but I do think that it at least seems worth trying to capture/understand some sort of felt difference between e.g. picking between Buridan's bales of hay, and choosing e.g. what career to pursue, even if you don't think there's a "right answer" in either case.
I agree that "infallible" maybe has the wrong implications, here, though I do think that part of the puz...
I'm glad you liked it, Lukas. It does seem like an interesting question how your current confidence in your own values relates to your interest in further "idealization," of what kind, and how much convergence makes a difference. Prima facie, it does seems plausible that greater confidence speaks in favor"conservatism" about what sorts of idealization you go in for, though I can imagine very uncertain-about-their-values people opting for conservatism, too. Indeed, it seems possible that conservatism is just generally pretty reasonable, here.
Hi Ben,
This does seem like a helpful kind of content to include (here I think of Luke’s section on this here, in the context of his work on moral patienthood). I’ll consider revising to say more in this vein. In the meantime, here are a few updates off the top of my head:
Hi Ben,
A few thoughts on this:
Hi Hadyn,
Thanks for your kind words, and for reading.
(Continued from comment on the main thread)
I'm understanding your main points/objections in this comment as:
The upshot seems to be that Joe, 80k, the AI researcher survey (2008), Holden-2016 are all at about a 3% estimate of AI risk, whereas AI safety researchers now are at about 30%. The latter is a bit lower (or at least differently distributed) than Rob expected, and seems higher than among Joe's advisors.
The divergence is big, but pretty explainable, because it concords with the direction that apparent biases point in. For the 3% camp, the credibility of one's name, brand, or field benefits from making a lowball estimates. Whereas the 30% camp is self-select...
I think I share Robby's sense that the methodology seems like it will obscure truth.
That said, I have neither your (Joe) extensive philosophical background nor have spent substantial time like you on a report like this, and I am interested in evidence to the contrary.
To me, it seems like you've tried to lay out a series of 6 steps of an argument, that you think each very accurately carve the key parts of reality that are relevant, and pondered each step for quite a while.
When I ask myself whether I've seen something like this produce great insight, it's ha...
Hi Rob,
Thanks for these comments.
Let’s call “there will be an existential catastrophe from power-seeking AI before 2070” p. I’m understanding your main objections in this comment as:
Sounds right to me. Per a conversation with Aaron a while back, I've been relying on the moderators to tag posts as personal blog, and had been assuming this one would be.
Glad to hear you found it helpful. Unfortunately, I don't think I have a lot to add at the moment re: how to actually pursue moral weighting research, beyond what I gestured at in the post (e.g., trying to solicit lots of your own/other people's intuitions across lots of cases, trying to make them consistent, that kind of thing). Re: articles/papers/posts, you could also take a look at GiveWell's process here, and the moral weight post from Luke Muelhauser I mentioned has a few references at the end that might be helpful (though most of them I haven'...
Hi Michael —
I meant, in the post, for the following paragraphs to address the general issue you mention:
...Some people don’t think that gratitude of this kind makes sense. Being created, we might say, can’t have been “better for” me, because if I hadn’t been created, I wouldn’t exist, and there would be no one that Wilbur’s choice was “worse for.” And if being created wasn’t better for me, the thought goes, then I shouldn’t be grateful to Wilbur for creating me.
Maybe the issues here are complicated, but at a high level: I don’t buy it. I
Hello Joe!
I enjoyed the McMahan/Parfit move of saying things are 'good for' without being 'better for'. I think it's clever, but I don't buy it. It seems like an linguistic sleight of hand and I don't really understand how it works.
I agree we have preferences over existing, but, well, so what? The fact I do or would have a preference does not automatically reveal what the axiological facts are. It's hard to know, even if we grant this, how it extends to not yet existing people. A present non-existing possible person doesn't have any preference, including w...
Thanks! Re: mental manipulation, do you have similar worries even granted that you’ve already been being manipulated in these ways? We can stipulate that there won’t be any increase in the manipulation in question, if you stay. One analogy might be: extreme cognitive biases that you’ve had all along. They just happen to be machine-imposed.
That said, I don’t think this part is strictly necessary for the thought experiment, so I’m fine with folks leaving it out if it trips them up.
Glad to hear you enjoyed it.
I haven't engaged much with tranquilism. Glancing at that piece, I do think that the relevant notions of "craving" and "clinging" are similar; but I wouldn't say, for example, that an absence of clinging makes an experience as good as it can be for someone.
Thanks :). I haven't thought much about personal universes, but glancing at the paper, I'd expect resource-distribution, for example, to remain an issue.
Glad to hear it :)
Re: "my motivational system is broken, I'll try to fix it" as the thing to say as an externalist realist: I think this makes sense as a response. The main thing that seems weird to me is the idea that you're fundamentally "cut off" from seeing what's good about helium, even though there's nothing you don't understand about reality. But it's a weird case to imagine, and the relevant notions of "cut off" and "understanding" are tricky.
Thanks for reading. Re: your version of anti-realism: is "I should create flourishing (or whatever your endorsed theory says)" in your mouth/from your perspective true, or not truth-apt?
To me Clippy's having or not having a moral theory doesn't seem very central. E.g., we can imagine versions in which Clippy (or some other human agent) is quite moralizing, non-specific, universal, etc about clipping, maximizing pain, or whatever.
Thanks for making this, Michel :)