All of Joe_Carlsmith&#x27;s Comments + Replies

SIA > SSA, part 1: Learning from the fact that you exist

"that just indicates that EDT-type reasoning is built into the plausibility of SIA"

If by this you mean "SIA is only plausible if you accept EDT," then I disagree. I think many of the arguments for SIA -- for example, "you should 1/4 on each of tails-mon, tails-tues, heads-mon, and heads-tues in Sleeping Beauty with two wakings each, and then update to being a thirder if you learn you're not in heads-tues," "telekinesis doesn't work," "you should be one-half on not-yet-flipped fair coins," "reference classes aren't a thing," etc -- don't depend on EDT... (read more)

It’s a good question, and one I considered going into in more detail on in the post (I'll add a link to this comment). I think it’s helpful to have in mind two types of people: “people who see the exact same evidence you do” (e.g., they look down on the same patterns of wrinkles on your hands, the same exact fading on the jeans they’re wearing, etc) and “people who might, for all you know about a given objective world, see the exact same evidence you do” (an example here would be “the person in room 2”). By “people in your epistemic situation,” I mean the ... (read more)

Cool, this gives me a clearer picture of where you're coming from. I had meant the central question of the post to be whether it ever makes sense to do the EDT-ish try-to-control-the-past thing, even in pretty unrealistic cases -- partly because I think answering "yes" to this is weird and disorienting in itself, even if it doesn't end up making much of a practical difference day-to-day; and partly because a central objection to EDT is that the past, being already fixed, is never controllable in any practically-relevant sense, even in e.g. Newcomb's cases.... (read more)

Not sure exactly what words people have used, but something like this idea is pretty common in the non-CDT literature, and I think e.g. MIRI explicitly talks about "controlling" things like your algorithm.

I think this is an interesting objection. E.g., "if you're into EDT ex ante, shouldn't you be into EDT ex post, and say that it was a 'good action' to learn about the Egyptians, because you learned that they were better off than you thought in expectation?" I think it depends, though, on how you are doing the ex post evaluation: and the objection doesn't work if the ex post evaluation conditions on the information you learn.

That is, suppose that before you read Wikipedia, you were 50% on the Egyptians were at 0 welfare, and 50% they were at 10 welfar... (read more)

MichaelStJules

This sounds like CDT, though, by conditioning on the past. If, for Newcomb's problem, we condition on the past and so the contents of the boxes, we get that one-boxing was worse: Of course, there's something hidden here, which is that if the box that could have been empty was not empty, you could not have two-boxed (or with a weaker predictor, it's unlikely that the box wasn't empty and you would have two-boxed).

"the emphasis here seems to be much more about whether you can actually have a causal impact on the past" -- I definitely didn't mean to imply that you could have a causal impact on the past. The key point is that the type of control in question is acausal.

I agree that many of these cases involve unrealistic assumptions, and that CDT may well be an effective heuristic most of the time (indeed, I expect that it is).

I don't feel especially hung up on calling it "control" -- ultimately it's the decision theory (e.g., rejecting CDT) that I'm intere... (read more)

Joe_Carlsmith3y5

Thanks for these comments.

Re: “physics-based priors,” I don't think I have a full sense of what you have in mind, but at a high level, I don’t yet see how physics comes into the debate. That is, AFAICT everyone agrees about the relevant physics — and in particular, that you can’t causally influence the past, “change” the past, and so on. The question as I see it (and perhaps I should’ve emphasized this more in the post, and/or put things less provocatively) is more conceptual/normative: whether when making decisions we should think of the past the wa... (read more)

Geoffrey Irving

By “physics-based” I’m lumping together physics and history a bit, but it’s hard to disentangle them especially when people start talking about multiverses. I generally mean “the combined information of the laws of physics and our knowledge of the past”. The reason I do want to cite physics too, even for the past case of (1), is that if you somehow disagreed about decision theorists in WW1 I’d go to the next part of the argument, which is that under the technology of WW1 we can’t do the necessary predictive control (they couldn’t build deterministic twins back then). However, it seems like we’re mostly in agreement, and you could consider editing the post to make that more clear. The opening line of your post is “I think that you can “control” events you have no causal interaction with, including events in the past.” Now the claim is “everyone agrees about the relevant physics — and in particular, that you can’t causally influence the past”. These two sentences seem inconsistent, and especially since your piece is long and quite technical opening with a wrong summary may confuse people. I realize you can get out of the inconsistency by leaning on the quotes, but it still seems misleading.

Joe_Carlsmith3y7

I'm imagining computers with sufficiently robust hardware to function deterministically at the software level, in the sense of very reliably performing the same computation, even if there's quantum randomness at a lower level. Imagine two good-quality calculators, manufactured by the same factory using the same process, which add together the same two numbers using the same algorithm, and hence very reliably move through the same high-level memory states and output the same answer. If quantum randomness makes them output different answers, I count that as a "malfunction."

JackM

OK thanks, and I have read through now and seen that you discuss randomness in section 4. Overall a very interesting read! Out of interest, is this idea of "acausal control" entirely novel or has it/something similar been discussed by others?

Narration: "Against neutrality about creating happy lives"

Joe_Carlsmith3y8

I have sympathy for responses like "look, it's just so clear that you can't control the past in any practically relevant sense that we should basically just assume the type of arguments in this post are wrong somehow." But I'm curious where you think the arguments actually go wrong, if you have a view about that? For example, do you think defecting in perfect deterministic twin prisoner's dilemmas with identical inputs is the way to go?

MichaelStJules

I think the thought experiments you give are pretty decisive in favour of the EDT answers over the CDT answers, and I guess I would agree that we have some kind of subtle control over the past, but I would also add: Acting and conditioning on our actions doesn't change what happened in the past; it only tells us more about it. Finding out that Ancient Egyptians were happier than you thought before doesn't make it so that they were happier than you thought before; they already observed their own welfare, and you were just ignorant of it. While EDT would not recommend for the sake of the Ancient Egyptians to find out more about their welfare (the EV would be 0, since the ex ante distributions are the same) or even filter only for positive information about their welfare (you would need to adjust your beliefs for this bias), doesn't it suggest that if you happen to find out that the Egyptians were better off than you thought, you did something good, and if you happen to find out that the Egyptians were worse off than you thought, you did something bad? If we control the past in the way you suggest in your thought experiments, do we also control it just by reading the Wikipedia page on Ancient Egyptians? Or do we only use EDT to evaluate the expected value of actions beforehand and not their actual value after the fact, or at least not in this way? And then, why does this seems absurd, but not the EDT answers to your thought experiments?

Geoffrey Irving3y12

So certainly physics-based priors is a big component, and indeed in some sense is all of it. That is, I think physics-based priors should give you an immediate answer of "you can't influence the past with high probability", and moreover that once you think through the problems in detail the conclusion will be that you could influence the past if physics were different (including boundary conditions, even if laws remain the same), but still that boundary condition priors should still tell us you can't influence the past. I'm happy to elaborate.

F... (read more)

Joe_Carlsmith3y7

Thanks for doing this!

On the limits of idealized values

On the limits of idealized values

Thanks, Richard :). Re: arbitrariness, in a sense the relevant choices might well end up arbitrary (and as you say, subjectivists need to get used to some level of unavoidable arbitrariness), but I do think that it at least seems worth trying to capture/understand some sort of felt difference between e.g. picking between Buridan's bales of hay, and choosing e.g. what career to pursue, even if you don't think there's a "right answer" in either case.

I agree that "infallible" maybe has the wrong implications, here, though I do think that part of the puz... (read more)

Joe_Carlsmith3y5

I'm glad you liked it, Lukas. It does seem like an interesting question how your current confidence in your own values relates to your interest in further "idealization," of what kind, and how much convergence makes a difference. Prima facie, it does seems plausible that greater confidence speaks in favor"conservatism" about what sorts of idealization you go in for, though I can imagine very uncertain-about-their-values people opting for conservatism, too. Indeed, it seems possible that conservatism is just generally pretty reasonable, here.

Draft report on existential risk from power-seeking AI

Joe_Carlsmith3y25

Hi Ben,

This does seem like a helpful kind of content to include (here I think of Luke’s section on this here, in the context of his work on moral patienthood). I’ll consider revising to say more in this vein. In the meantime, here are a few updates off the top of my head:

It now feels more salient to me now just how many AI applications may be covered by systems that either aren’t agentic planning/strategically aware (including e.g. interacting modular systems, especially where humans are in the loop for some parts, and/or intuitively “sphexish/brittl

... (read more)

Ben Pace

Great answer, thanks.

Draft report on existential risk from power-seeking AI

Joe_Carlsmith3y22

Hi Ben,

A few thoughts on this:

It seems possible that attempting to produce “great insight” or “simple arguments of world-shattering importance” warrants a methodology different from the one I’ve used here. But my aim here is humbler: to formulate and evaluate an existing argument that I and various others take seriously, and that lots of resources are being devoted to; and to come to initial, informal, but still quantitative best-guesses about the premises and conclusion, which people can (hopefully) agree/disagree with at a somewhat fine-grain

... (read more)

Ben Pace

Thanks for the thoughtful reply. I do think I was overestimating how robust you're treating your numbers and premises, it seems like you're holding them all much more lightly than I think I'd been envisioning. FWIW I am more interested in engaging with some of what you wrote in in your other comment than engaging on the specific probability you assign, for some of the reasons I wrote about here. I think I have more I could say on the methodology, but alas, I'm pretty blocked up with other work atm. It'd be neat to spend more time reading the report and leave more comments here sometime.

Draft report on existential risk from power-seeking AI

Joe_Carlsmith3y4

Hi Hadyn,

Thanks for your kind words, and for reading.

Thanks for pointing out these pieces. I like the breakdown of the different dimensions of long-term vs. near-term.
Broadly, I agree with you that the document could benefit from more about premise 5. I’ll consider revising to add some.
I’m definitely concerned about misuse scenarios too (and I think lines here can get blurry -- see e.g. Katja Grace’s recent post); but I wanted, in this document, to focus on misalignment in particular. The question of how to weigh misuse vs. misalignment r

Joe_Carlsmith3y18

(Continued from comment on the main thread)

I'm understanding your main points/objections in this comment as:

You think the multiple stage fallacy might be the methodological crux behind our disagreement.
You think that >80% of AI safety researchers at MIRI, FHI, CHAI, OpenAI, and DeepMind would assign >10% probability to existential catastrophe from technical problems with AI (at some point, not necessarily before 2070). So it seems like 80k saying 1-10% reflects a disagreement with the experts, which would be strange in the context of e.g.

... (read more)

RyanCarey3y11

The upshot seems to be that Joe, 80k, the AI researcher survey (2008), Holden-2016 are all at about a 3% estimate of AI risk, whereas AI safety researchers now are at about 30%. The latter is a bit lower (or at least differently distributed) than Rob expected, and seems higher than among Joe's advisors.

The divergence is big, but pretty explainable, because it concords with the direction that apparent biases point in. For the 3% camp, the credibility of one's name, brand, or field benefits from making a lowball estimates. Whereas the 30% camp is self-select... (read more)

Ben Pace3y20

I think I share Robby's sense that the methodology seems like it will obscure truth.

That said, I have neither your (Joe) extensive philosophical background nor have spent substantial time like you on a report like this, and I am interested in evidence to the contrary.

To me, it seems like you've tried to lay out a series of 6 steps of an argument, that you think each very accurately carve the key parts of reality that are relevant, and pondered each step for quite a while.

When I ask myself whether I've seen something like this produce great insight, it's ha... (read more)

Draft report on existential risk from power-seeking AI

Joe_Carlsmith3y15

Hi Rob,

Thanks for these comments.

Let’s call “there will be an existential catastrophe from power-seeking AI before 2070” p. I’m understanding your main objections in this comment as:

It seems to you like we’re in a world where p is true, by default. Hence, 5% on p seems too low to you. In particular:
1. It implies 95% confidence on not p, which seems to you overly confident.
2. If p is true by default, you think the world would look like it does now; so if this world isn’t enough to get me above 5%, what would be?
3. Because p seems true to you by def

... (read more)

Problems of evil

The importance of how you weigh it

Sounds right to me. Per a conversation with Aaron a while back, I've been relying on the moderators to tag posts as personal blog, and had been assuming this one would be.

Against neutrality about creating happy lives

Glad to hear you found it helpful. Unfortunately, I don't think I have a lot to add at the moment re: how to actually pursue moral weighting research, beyond what I gestured at in the post (e.g., trying to solicit lots of your own/other people's intuitions across lots of cases, trying to make them consistent, that kind of thing). Re: articles/papers/posts, you could also take a look at GiveWell's process here, and the moral weight post from Luke Muelhauser I mentioned has a few references at the end that might be helpful (though most of them I haven'... (read more)

Joe_Carlsmith3y22

Hi Michael —

I meant, in the post, for the following paragraphs to address the general issue you mention:

Some people don’t think that gratitude of this kind makes sense. Being created, we might say, can’t have been “better for” me, because if I hadn’t been created, I wouldn’t exist, and there would be no one that Wilbur’s choice was “worse for.” And if being created wasn’t better for me, the thought goes, then I shouldn’t be grateful to Wilbur for creating me.
Maybe the issues here are complicated, but at a high level: I don’t buy it. I

... (read more)

MichaelPlant3y12

Hello Joe!

I enjoyed the McMahan/Parfit move of saying things are 'good for' without being 'better for'. I think it's clever, but I don't buy it. It seems like an linguistic sleight of hand and I don't really understand how it works.

I agree we have preferences over existing, but, well, so what? The fact I do or would have a preference does not automatically reveal what the axiological facts are. It's hard to know, even if we grant this, how it extends to not yet existing people. A present non-existing possible person doesn't have any preference, including w... (read more)

Contact with reality

Joe_Carlsmith3y0

Thanks! Re: mental manipulation, do you have similar worries even granted that you’ve already been being manipulated in these ways? We can stipulate that there won’t be any increase in the manipulation in question, if you stay. One analogy might be: extreme cognitive biases that you’ve had all along. They just happen to be machine-imposed.

That said, I don’t think this part is strictly necessary for the thought experiment, so I’m fine with folks leaving it out if it trips them up.

richard_ngo

Yes, I think I still have these concerns; if I had extreme cognitive biases all along, then I would want them removed even if it didn't improve my understanding of the world. It feels similar to if you told me that I'd lived my whole life in a (pleasant) dreamlike fog, and I had the opportunity to wake up. Perhaps this is the same instinct that motivates meditation? I'm not sure.

On clinging

Actually possible: thoughts on Utopia

Glad to hear you enjoyed it.

I haven't engaged much with tranquilism. Glancing at that piece, I do think that the relevant notions of "craving" and "clinging" are similar; but I wouldn't say, for example, that an absence of clinging makes an experience as good as it can be for someone.

Joe_Carlsmith3y1

Thanks :). I haven't thought much about personal universes, but glancing at the paper, I'd expect resource-distribution, for example, to remain an issue.

Alienation and meta-ethics (or: is it possible you should maximize helium?)

Alienation and meta-ethics (or: is it possible you should maximize helium?)

Glad to hear it :)

Re: "my motivational system is broken, I'll try to fix it" as the thing to say as an externalist realist: I think this makes sense as a response. The main thing that seems weird to me is the idea that you're fundamentally "cut off" from seeing what's good about helium, even though there's nothing you don't understand about reality. But it's a weird case to imagine, and the relevant notions of "cut off" and "understanding" are tricky.