Over at The 80,000 Hours Podcast we just published an interview that is likely to be of particular interest to people who identify as involved in the effective altruism community: Joe Carlsmith on navigating serious philosophical confusion.
You can click through for the audio, a full transcript, and related links. Below is the episode summary and some key excerpts.
…if you really think that there’s a good chance that you’re not understanding things, then something that you could do that at least probably has some shot of helping is to put future generations in a better position to solve these questions — once they have lots of time and hopefully are a whole lot smarter and much more informed than we are…
What is the nature of the universe? How do we make decisions correctly? What differentiates right actions from wrong ones?
Such fundamental questions have been the subject of philosophical and theological debates for millennia. But, as we all know, and surveys of expert opinion make clear, we are very far from agreement. So… with these most basic questions unresolved, what’s a species to do?
In today’s episode, philosopher Joe Carlsmith — Senior Research Analyst at Open Philanthropy — makes the case that many current debates in philosophy ought to leave us confused and humbled. These are themes he discusses in his PhD thesis, A stranger priority? Topics at the outer reaches of effective altruism.
To help transmit the disorientation he thinks is appropriate, Joe presents three disconcerting theories — originating from him and his peers — that challenge humanity’s self-assured understanding of the world.
The first idea is that we might be living in a computer simulation, because, in the classic formulation, if most civilisations go on to run many computer simulations of their past history, then most beings who perceive themselves as living in such a history must themselves be in computer simulations. Joe prefers a somewhat different way of making the point, but, having looked into it, he hasn’t identified any particular rebuttal to this ‘simulation argument.’
If true, it could revolutionise our comprehension of the universe and the way we ought to live.
The second is the idea that “you can ‘control’ events you have no causal interaction with, including events in the past.” The thought experiment that most persuades him of this is the following:
Perfect deterministic twin prisoner’s dilemma: You’re a deterministic AI system, who only wants money for yourself (you don’t care about copies of yourself). The authorities make a perfect copy of you, separate you and your copy by a large distance, and then expose you both, in simulation, to exactly identical inputs (let’s say, a room, a whiteboard, some markers, etc.). You both face the following choice: either (a) send a million dollars to the other (“cooperate”), or (b) take a thousand dollars for yourself (“defect”).
Joe thinks, in contrast with the dominant theory of correct decision-making, that it’s clear you should send a million dollars to your twin. But as he explains, this idea, when extrapolated outwards to other cases, implies that it could be sensible to take actions in the hope that they’ll improve parallel universes you can never causally interact with — or even to improve the past. That is nuts by anyone’s lights, including Joe’s.
The third disorienting idea is that, as far as we can tell, the universe could be infinitely large. And that fact, if true, would mean we probably have to make choices between actions and outcomes that involve infinities. Unfortunately, doing that breaks our existing ethical systems, which are only designed to accommodate finite cases.
In an infinite universe, our standard models end up unable to say much at all, or give the wrong answers entirely. While we might hope to patch them in straightforward ways, having looked into ways we might do that, Joe has concluded they all quickly get complicated and arbitrary, and still have to do enormous violence to our common sense. For people inclined to endorse some flavour of utilitarianism, Joe thinks ‘infinite ethics’ spell the end of the ‘utilitarian dream‘ of a moral philosophy that has the virtue of being very simple while still matching our intuitions in most cases.
These are just three particular instances of a much broader set of ideas that some have dubbed the “train to crazy town.” Basically, if you commit to always take philosophy and arguments seriously, and try to act on them, it can lead to what seem like some pretty crazy and impractical places. So what should we do with this buffet of plausible-sounding but bewildering arguments?
Joe and Rob discuss to what extent this should prompt us to pay less attention to philosophy, and how we as individuals can cope psychologically with feeling out of our depth just trying to make the most basic sense of the world.
In the face of all of this, Joe suggests that there is a promising and robust path for humanity to take: keep our options open and put our descendants in a better position to figure out the answers to questions that seem impossible for us to resolve today — a position he calls “wisdom longtermism.”
Joe fears that if people believe we understand the universe better than we really do, they’ll be more likely to try to commit humanity to a particular vision of the future, or be uncooperative to others, in ways that only make sense if you were certain you knew what was right and wrong.
In today’s challenging conversation, Joe and Rob discuss all of the above, as well as:
- What Joe doesn’t like about the drowning child thought experiment
- An alternative thought experiment about helping a stranger that might better highlight our intrinsic desire to help others
- What Joe doesn’t like about the expression “the train to crazy town”
- Whether Elon Musk should place a higher probability on living in a simulation than most other people
- Whether the deterministic twin prisoner’s dilemma, if fully appreciated, gives us an extra reason to keep promises
- To what extent learning to doubt our own judgement about difficult questions — so-called “epistemic learned helplessness” — is a good thing
- How strong the case is that advanced AI will engage in generalised power-seeking behaviour
Get this episode by subscribing to our podcast on the world’s most pressing problems and how to solve them: type ‘80,000 Hours’ into your podcasting app.
Producer: Keiran Harris
Audio mastering: Milo McGuire and Ben Cordell
Transcriptions: Katy Moore
The deterministic twin prisoner's dilemma
Joe Carlsmith: The experiment that convinces me most is: Imagine that you are a deterministic AI system and you only care about money for yourself. So you’re selfish. There’s also a copy of you, a perfect copy, and you’ve both been separated very far away — maybe you’re on spaceships flying in opposite directions or something like that. And you’re both going to face the exact same inputs. So you’re deterministic: the only way you’re going to make a different choice is if the computers malfunction or something like that. Otherwise you’re going to see the exact same environment.
In the environment, you have the option of taking $1,000 for yourself: we’ll call that “defecting” — or giving $1 million to the other guy: we’ll call that “cooperating.” The structure is similar to a prisoner’s dilemma. You’re going to make your choice, and then later you’re going to rendezvous.
So what should you do? Well, here’s an argument that I don’t find convincing, but that I think would be the argument offered by someone who thinks you can only control what you can cause. The argument would be something like: your choice doesn’t cause that guy’s choice. He’s far away; maybe he’s lightyears away. You should treat his choice as fixed. And then whatever he chooses, you get more money if you defect. If he defects, then you’ll get nothing by cooperating and $1,000 by defecting. If he sends the money to you, then you’ll get $1.001 million by defecting and $1 million by cooperating. No matter what, it’s better to defect. So you should defect.
But I think that’s wrong. The reason I think it’s wrong is that you are going to make the same choice. You’re deterministic systems, and so whatever you do, he’s going to do it too. In fact, in this particular case — and we can talk about looser versions where the inputs aren’t exactly identical — the connection between you two is so tight that literally, if you want to write something on your whiteboard, he’s going to write that too. If you want him to write on his whiteboard, “Hello, this is a message from your copy,” or something like that, you can just write it on your own whiteboard. When you guys rendezvous, his whiteboard will say the thing that you wrote. You can sit there going, “What do I want?” You really can control what he writes. If you want to draw a particular kitten, if you want to scribble in a certain way, he’s going to do that exact same thing, even though he’s far away and you’re not in causal interaction with him.
To me, I think there’s just a weird form of control you have over what he does that we just need to recognise. So I think that’s relevant to your decision, in the sense that if you start reaching for the defect button, you should be like, “OK, what button is he reaching for right now?” As you move your arm, his arm is moving with you. And so you reach for the defect, he’s about to defect. You could basically be like, “What button do I want him to press?” and just press it yourself and he’ll press it. So to me, it feels pretty easy to press the “send myself $1 million” button.
Joe Carlsmith: The classic thought experiment that people often focus on, though I don’t think it’s the most dispositive, is this case called Newcomb’s problem, where Omega is this kind of superintelligent predictor of your actions. Omega puts you in the situation where you face two boxes: one of them is opaque, one of them is transparent. The transparent box has $1,000, the opaque box has either $1 million or nothing.
Omega puts $1 million in the box if Omega predicts that you will take only the opaque box and leave the $1,000 alone (even though you can see it right there). And Omega puts nothing in the opaque box if Omega predicts that you will take both boxes.
So the same argument arises for the causal decision theory (CDT). For CDT, the thought is: you can’t change what’s in the boxes; the boxes are already fixed. Omega already made her prediction. And no matter what, you’ll get more money if you take the $1,000. If there was some dude over there who could see the boxes, and you were like, “Hey, see what’s in the box, and what choice will give me more money?” — you don’t even need to ask, because you know it’s always just take the extra $1,000.
But I think you should one-box in this case, because I think if you one-box then it will have been the case that Omega predicted that you one-boxed, because Omega is always right about the predictions, and so there will be the million.
I think a way to pump this intuition for me that matters is imagining doing this case over and over with Monopoly money. Each time, I try taking two boxes and I notice the opaque box is empty. I take one box, opaque box is full. I do this over and over. I try doing intricate mental gymnastics. I do like a somersault, I take the boxes. I flip a coin and take the box — well, flipping a coin, Omega has to be really good, so we can talk about that.
If Omega is sufficiently good at predicting your choice, then just like every time, what you eventually will learn is that you effectively have a type of magical power. Like I can just wave my arms over the opaque box and say, “Shazam! I hereby declare that this box shall be full with $1 million. Thus, as I one-box, it is so.” Or if I can be like, “Shazam! I declare that the box shall be empty. Like thus, as I two-box, it is so.” I think eventually you just get it in your bones, such that when you finally face the real money, I guess I expect this feeling of like, “I know this one, I’ve seen this before.” I kind of know what’s going to happen at some more visceral expectation level if I one-box or two-box, and I know which one leaves me rich.
The idea of 'wisdom longtermism'
Joe Carlsmith: In the thesis, I have this distinction between what I call “welfare longtermism” and “wisdom longtermism.”
Welfare longtermism is roughly the idea that our moral focus should be on specifically the welfare of the finite number of future people who might live in our lightcone.
And wisdom longtermism is a broader idea that our moral focus should be reaching a kind of wise and empowered civilisation in general. I think of welfare longtermism as a lower bound on the stakes of the future more broadly — at the very least, the future matters at least as much as the welfare of the future people matters. But to the extent there are other issues that might be game changing or even more important, I think the future will be in a much better position to deal with those than we are, at least if we can make the right future. …
There’s a line in Nick Bostrom’s book Superintelligence about something like, if you’re digging a hole but there’s a bulldozer coming, maybe you should wonder about the value of digging a hole. I also think we’re plausibly on the cusp of pretty radical advances in humanity’s understanding of science and other things, where there might be a lot more leverage and a lot more impact from making sure that the stuff you’re doing matters specifically to how that goes, rather than to just kind of increasing our share of knowledge overall. You want to be focusing on decisions we need to make now that we would have wanted to make differently.
So it looks good to me, the focus on the long-term future. I want to be clear that I think it’s not perfectly safe. I think a thing we just generally need to give up is the hope that we will have a theory that makes sense of everything — such that we know that we’re acting in the safe way, that it’s not going to go wrong, and it’s not going to backfire. I think there can be a way that people look to philosophy as a kind of mode of Archimedean orientation towards the world — that will tell them how to live, and justify their actions, and give a kind of comfort and structure — that I think at some point we need to give up.
On the classic drowning child thought experiment
Joe Carlsmith: I think what that can do is sort of break your conception of yourself as a kind of morally sincere agent — and at a deeper level, it can break your conception of society and your peers, or society as a morally sincere endeavour, in some sense. Things can start to seem kind of sick at their core, and we’re just all looking away from the sense in which we’re horrible people, or something like that.
I actually think part of the attraction of communities like the effective altruism community, for many people, is it sort of offers a vision of a recovery of a certain moral sincerity. You find this community, and actually, these people are maybe trying — more so than you had encountered previously — to really take this stuff seriously, to act rightly by its lights. And I think that can be a powerful idea.
But there is this then this thing comes up, where it’s like, “OK, but how much is enough? Exactly how far do you go with this? What is demanded?” I think people can end up in a mode where their relationship with this is what you said: it’s about not being bad, not sucking — like you thought “maybe I sucked” and now you’re really trying not to suck — you don’t want to be kind of punished or worthy of reproach. It’s a lot about something like guilt. I think that the thought experiment itself is sort of about calling you an asshole. It’s like, “If you didn’t save the child, you’re an asshole.” So everyone’s an asshole.
Rob Wiblin: But look at how you’re living the rest of your life.
Joe Carlsmith: Exactly. I think sometimes you’re an asshole, and we need to be able to notice that. But also, for one thing, it’s actually not clear to me that you’re an asshole for not donating to a charity — that’s not something that we normally think — and I think we should notice that. Also, it doesn’t seem to me like a very healthy or wholehearted basis for engaging with this stuff. I think there are alternatives that are better.
On why bother being good
Rob Wiblin: What are the personal values of yours that motivate you to care to try to help other people, even when it’s kind of a drag, or demoralising, or it feels like you’re not making progress?
Joe Carlsmith: One value that’s important to me, though it’s a little hard to communicate, is something like “looking myself and the world in the eye.” It’s about kind of taking responsibility for what I’m doing; what kind of force I’m going to be in the world in different circumstances; trying to understand myself, understand the world, and understand what in fact I am in relationship to it — and to choose that and endorse that with a sense of agency and ownership.
One way that shows up for me in the context of helping others is trying to take really seriously that my mind is not the world — that the limits of my experience are not the limits of what’s real.
In particular, I wake up and I’m just like Joe every day — every day it’s just Joe stuff; I wake up in the sphere of Joe around me. So Joe stuff is really salient and vivid: there’s this sort of zone — it’s not just my experience, there’s also, like, people and my kitchen — of things that are kind of vivid.
And then there’s a part of the world that my brain is doing a lot less to model — but that doesn’t mean the thing is less real; it’s just my brain is putting in a lot fewer resources to modelling it. So things like other people are just as real as I am. When something happens to me, at least from a certain perspective, that’s not a fundamentally different type of event than when something happens to someone else. So part of living in the real world for me is living in light of that fact, and trying to really stay in connection with just that other people are just as real as I am.
More broadly, when we talk about forms of altruism that are more fully impartial — or trying to ask questions like, “What is really the most good I can do?” — for me, that’s a lot about trying to live in the world as a whole, not artificially limiting which parts of the world I’m treating as real or significant. Because I don’t live in just one part of the world. When I act, I act in a way that affects the whole world, or that can affect the whole world. There’s some sense in which I want to be not imposing some myopia upfront on what is in scope for me. I think those are both core for me in terms of what helping others is about.
I'm deeply confused about this. According to the premise, you are a deterministic AI system. That means what you will do is fully determined by your code and your input, both of which are already given. So at this point, there is no longer any freedom to many a choice – you will just do what your given code and input determine. So what does it mean to ask what you should do? Does that actually mean: (i) what code should your programmer have written? Or does it mean: (ii) what would the right choice be in the counterfactual situation in which you are not deterministic after all and do have a choice (while your twin doesn't? or does as well?). In order to answer version (i), we need to know the preferences of the programmer (rather than your own preferences). If the programmer is interested in the joint payoff of both twins, she should have written code that makes you cooperate. In order to answer version (ii), we would need to know what the consequences of making either choice in the counterfactual world where you do have a choice are on the possibility of the other twin to make a choice. If your choice does not influence the possibility of the other twin to make a choice, the dominant strategy is defection, as in the simple PD. Otherwise, who knows...