1164 karmaJoined Dec 2021


Thank you!  It was your speech at the OFTW meeting that largely inspired it. 

We all agree that you should get utility.  You are pointing out that FDT agents get more utility.  But once they are already in the situation where they've been created by the demon, FDT agents get less utility.  If you are the type of agent to follow FDT, you will get more utility, just as if you are the type of agent to follow CDT while being in a scenario that tortures FDTists, you'll get more utility.  The question of decision theory is, given the situation you are in, what gets you more utility--what is the rational thing to do.  Eliezer's turns you into the type of agent who often gets more utility, but that does not make it the right decision theory.  The fact that you want to be the type of agent who does X doesn't make doing X rational if doing X is bad for you and not doing X is rewarded artificially.  

Again, there is no dispute about whether on average one boxers or two boxers get more utility or which kind of AI you should build. 

Wait sorry it’s hard to see the broader context of this comment on account of being on my phone and comment sections being hard to navigate on ea forum. I don’t know if I said eliezer had 100% credence, but if I did, that was wrong.

He didn't quote it--he linked to it.  I didn't quote the broader section because it was ambiguous and confusing.  The reason not accounting for interactionist dualism matters is because it means that he misstates the zombie argument, and his version is utterly unpersuasive. 

The demon case shows that there are cases where FDT loses, as is true of all decision theories.  IF the question is which decision theory will programming into an AI generate most utility, then that's an empirical question that depends on facts about the world.  If it's once you're in a situation which  will get the most utility, well, that's causal decision theory.  

Decision theories are intended as theories of what is rational for you to do.  So it describes what choices are wise and which choices are foolish.  I think Eliezer is confused about what a decision theory is, but that is a reason to trust his judgment less.  

In the demon case, we can assume it's only almost infallible, so every million times it makes a mistake.  The demon case is a better example, because I have some credence in EVT, and EVT entails you should one box.  I am waaaaaaaaaaaay more confident FDT is crazy than I am that you should two box. 

I would agree with the statement "if Eliezer followed his decision theory, and the world was such that one frequently encountered lots of Newcombe's problems and similar, you'd end up with more utility."  I think my position is relatively like MacAskill's in the linked post where he says that FDT is better as a theory of the agent you should want to be than what's rational.  

But I think that rationality won't always benefit you.  I think you'd agree with that.  If there's a demon who tortures everyone who believes FDT, then believing FDT, which you'd regard as rational, would make you worse off.  If there's another demon who will secretly torture you if you one box, then one boxing is bad for you!  It's possible to make up contrived scenarios that punish being rational--and Newcombe's problem is a good example of that.

Notably, if we're in the twin scenario or the scenario that tortures FDTists, CDT will dramatically beat FDT.  

I think the example that's most worth focusing on is the demon legs cut off case.  I think it's not crazy at all to one box, and have maybe 35% credence that one boxing is right.  I have maybe 95% credence that you shouldn't cut off your legs in the demon case, and 80% confidence that the position that you can is crazy, in the sense that if you spent years thinking about it while being relatively unbiased you'd almost certainly give it up. 

I know you said you didn't want to repeatedly go back and forth, but . . . 

Yes, I agree that if you have some psychological mechanism by which you can guarantee that you'll follow through on future promises--like programming an AI--then that's worth it.  It's better to be the kind of agent who follows FDT (in many cases).  But the way I'd think about this is that this is an example of rational irrationality, where it's rational to try to get yourself to do something irrational in the future because you get rewarded for it.  But remember, decision theories are theories about what's rational, not theories about what kind of agent you should be.  

I think we agree with both of the following claims: 

  1. If you have some way to commit in advance to follow FDT in cases like the demon case or the bomb case, you should do so.  
  2. Once you are in those cases, you have most reason to defect.  
  3. Given that you can predict that you'll have most reason to defect, you can sort of psychologically make a deal with your future self where you say "NO REALLY, DON'T DEFECT, I'M SERIOUS."  

My claim though, is that decision theory is about 2, rather than 1 or 3.  No one disputes that the kinds of agents who two box do worse than the kinds of agents who one box--the question is about what you should do once you're in that situation. 

If an AI is going to encounter Newcombe's problem a lot, everyone agrees you should program it to one box. 

Well put!  Though one nitpick: I didn't defer to Eliezer much.  Instead, I concluded that he was honestly summarizing the position.  So I assumed physicalism was true because I assumed, wrongly, that he was correctly summarizing the zombie argument. 

Oh sorry, yeah I misunderstood what point you were making.  I agree that you want to be the type of agent who cuts off their legs--you become better off in expectation.  But the mere fact that the type of agent who does A rather than B gets more utility on average does not mean that you should necessarily do A rather than B.  If you know you are in a situation where doing A is guaranteed to get you less utility than B, you should do B.  The question of which agent you should want to be is not the same as which agent is acting rationally.  I agree with MacAskill's suggestion that FDT is the result of conflating what type of agent to be with what actions are rational.  FDT is close to the right answer for the second and a crazy answer for the first imo.  

Happy to debate someone about FDT.  I'll make a post on LessWrong about it.  

One other point, I know that this will sound like a cop-out, but I think that the FDT stuff is the weakest example in the post.  I am maybe 95% confident that FDT is wrong, while 99.9% confident that Eliezer's response to zombies fails and 99.9% confident that he's overconfident about animal consciousness.

Load more