SA

Scott Alexander

3133 karmaJoined

Comments
47

I'm having trouble understanding this. The part that comes closest to making sense to me is this summary:

The fact that life has survived so long is evidence that the rate of
potentially omnicidal events is low...[this and the anthropic shadow effect] cancel out, so that overall the historical record provides evidence for a true rate close to the observed rate.

Are they just applying https://en.wikipedia.org/wiki/Self-indication_assumption_doomsday_argument_rebuttal to anthropic shadow without using any of the relevant terms, or is it something else I can't quite get?

Also, how would they respond to the fine-tuning argument? That is, it seems like most planets (let's say 99.9%) cannot support life (eg because they're too close to their sun). It seems fantastically surprising that we find ourselves on a planet that does support life, but anthropics provides an easy way out of this apparent coincidence. That is, anthropics tells us that we overestimate the frequency of things that allow us to be alive. This seems like reverse anthropic shadow, where anthropic shadow is underestimating the frequency of things that cause us to be dead. So is the paper claiming that anthropics does change our estimates of the frequency of good things, but can't change our estimate of the frequency of bad things? Why would this be?

I mostly agree with this. The counterargument I can come up with is that the best AI think tanks right now are asking for grants in the range of $2 - $5 million and seem to be pretty influential, so it's possible that a grantmaker who got $8 million could improve policy by 5%, in which case it's correct to equate those two. 

I'm not sure how that fits with the relative technical/policy questions.

Yes, I added them partway through after thinking about the question set more.

The article was obviously terrible, and I hope the listed mistakes get corrected, but I haven't seen a request for correction on the claim that CFAR/Lightcone has $5 million of FTX money and isn't giving it back. Is there any more information on whether this is true and, if so, what their reasoning is?

I think this is more over-learning and institutional scar tissue from FTX. The world isn't divided into Bad Actors and Non-Bad-Actors such that the Bad Actors are toxic and will destroy everything they touch.

There's increasing evidence that Sam Altman is a cut-throat businessman who engages in shady practices. This also describes, for example, Bill Gates and Elon Musk, both of whom also have other good qualities. I wouldn't trust either of them to single-handedly determine the fate of the world, but they both seem like people who can be worked with in the normal paradigm of different interests making deals with each other while appreciating a risk of backstabbing.

I think "Sam Altman does shady business practices, therefore all AI companies are bad actors and alignment is impossible" is a wild leap. We're still in the early (maybe early middle) stages of whatever is going to happen. I don't think this is the time to pick winners and put all eggs in a single strategy. Besides, what's the alternative? Policy? Do you think politicians aren't shady cut-throat bad actors? That the other activists we would have to work alongside aren't? Every strategy involves shifting semi-coalitions with shady cut-throat bad actors of some sort of another, you just try to do a good job navigating them and keep your own integrity intact.

If your point is "don't trust Sam Altman absolutely to pursue our interests above his own", point taken. But there are vast gulfs between "don't trust him absolutely" and "abandon all strategies that come into contact with him in any way". I think the middle ground here is to treat him approximately how I think most people here treat Elon Musk. He's a brilliant but cut-throat businessman who does lots of shady practices. He seems to genuinely have some kind of positive vision for the world, or want for PR reasons to seem like he has a positive vision for the world, or have a mental makeup incapable of distinguishing those two things. He's willing to throw the AI safety community the occasional bone when it doesn't interfere with business too much. We don't turn ourselves into the We Hate Elon Musk movement or avoid ever working with tech companies because they contain people like Elon Musk. We distance ourselves from him enough that his PR problems aren't our PR problems (already done in Sam's case; thanks to the board the average person probably thinks of us as weird anti-Sam-Altman fanatics) describe his positive and negative qualities honestly if asked, try to vaguely get him to take whatever good advice we have that doesn't conflict with his business too much, and continue having a diverse portfolio of strategies at any given time. Or, I mean, part of the shifting semi-coalitions is that if some great opportunity to get rid of him comes, we compare him to the alternatives and maybe take it. But we're so far away from having that alternative that pining after it is a distraction from the real world.

I thought we already agreed the demon case showed that FDT wins in real life, since FDT agents will consistently end up with more utility than other agents.

Eliezer's argument is that you can become the kind of entity that is programmed to do X, by choosing to do X. This is in some ways a claim about demons (they are good enough to predict even the choices you made with "your free will"). But it sounds like we're in fact positing that demons are that good - I don't know how to explain how they have 999,999/million success rate otherwise - so I think he is right.

I don't think the demon being wrong one in a million times changes much. 999,999 of the people created by the demon will be some kind of FDT decision theorist with great precommitment skills. If you're the one who isn't, you can observe that you're the demon's rare mistake and avoid cutting off your legs, but this just means you won the lottery - it's not a generally winning strategy.

Decision theories are intended as theories of what is rational for you to do.  So it describes what choices are wise and which choices are foolish. 

I don't understand why you think that the choices that get you more utility with no drawbacks are foolish, and the choices that cost you utility for no reason are wise.

On the Newcomb's Problem post, Eliezer explicitly said that he doesn't care why other people are doing decision theory, he would like to figure out a way to get more utility. Then he did that. I think if you disagree with his goal, you should be arguing "decision theory should be about looking good, not about getting utility" (so we can all laugh at you) rather than saying "Eliezer is confidently and egregiously wrong" and hiding the fact that one of your main arguments is that he said we should try to get utility instead of failing all the time and then came up with a strategy that successfully does that.

I think rather than say that Eliezer is wrong about decision theory, you should say that Eliezer's goal is to come up with a decision theory that helps him get utility, and your goal is something else, and you have both come up with very nice decision theories for achieving your goal.

(what is your goal?)

My opinion on your response to the demon question is "The demon would never create you in the first place, so who cares what you think?" That is, I think your formulation of the problem includes a paradox - we assume the demon is always right, but also, that you're in a perfect position to betray it and it can't stop you. What would actually happen is the demon would create a bunch of people with amputation fetishes, plus me and Eliezer who it knows wouldn't betray it, and it would never put you in the position of getting to make the choice in real life (as opposed to in an FDT algorithmic way) in the first place. The reason you find the demon example more compelling than the Newcomb example is that it starts by making an assumption that undermines the whole problem - that is, that the demon has failed its omniscience check and created you who are destined to betray it. If your problem setup contains an implicit contradiction, you can prove anything.

I don't think this is as degenerate a case as "a demon will torture everyone who believes FDT". If that were true, and I expected to encounter that demon, I would simply try not to believe FDT (insofar as I can voluntarily change my beliefs). While you can always be screwed over by weird demons, I think decision theory is about what to choose in cases where you have all of the available knowledge and also a choice in the matter, and I think the leg demon fits that situation.

I guess any omniscient demon reading this to assess my ability to precommit will have learned I can't even precommit effectively to not having long back-and-forth discussions, let alone cutting my legs off. But I'm still interested in where you're coming from here since I don't think I've heard your exact position before.

Have you read https://www.lesswrong.com/posts/6ddcsdA2c2XpNpE5x/newcomb-s-problem-and-regret-of-rationality ? Do you agree that this is our crux?

Would you endorse the statement "Eliezer, using his decision theory, will usually end out with more utility than me over a long life of encountering the sorts of weird demonic situations decision theorists analyze, I just think he is less formally-rational" ? 

Or do you expect that you will, over the long run, get more utility than him?

Sorry if I misunderstood your point. I agree this is the strongest objection against FDT. I think there is some sense in which I can become the kind of agent who cuts off their legs (ie by choosing to cut off my legs), but I admit this is poorly specified.

I think there's a stronger case for, right now, having heard about FDT for the first time, deciding I will follow FDT in the future. Various gods and demons can observe this and condition on my decision, so when the actual future comes around, they will treat me as an FDT-following agent rather than a non-FDT-following agent. Even though future-created-me isn't exactly in a position to influence the (long-since gone) demon, current me is in a position to make this decision for future relevant situations, and should decide to follow FDT in general. Part of this decision I've made involves being the kind of person who would take the FDT option in hypothetical scenarios.

Then there's the additional question of whether to defect against the demons/gods later, and say "Haha, back in August 2023 I resolved to become an FDT agent, and I fooled you into believing me, but now that I've been created I'm just going to not cut off my legs after all". I think of this as - suppose every past being created by the demon has cut off its legs, ie the demon has a 100% predictive success rate over millions of cases. So the demon would surely predict if I would do this. That means I should (now) try really hard not to do this. Cf. Parfit's Hitchhiker. Can I bind my future self like this? I think empirically yes - I think I have enough honor that if I tell hypothetical demon gods now that I'm going to do various things, I can actually do them when the time comes. This will be "irrational" in some sense, but I'll still end up with more utility than everyone else. 

Is there some sense in which, if I decide not to cut off my legs, I would wink out of existence? I admit feeling a superstitious temptation to believe this (a non-superstitious justification might be wondering if I'm the real me, or a version of me in the omniscient demon's simulation to predict what I would do). I think the literal answer is no but that it's practically useful to keep my superstitious belief in this to allow myself to do the irrational thing that gets me more utility. But this is a weird enough sidetrack that I'm really not sure I'm still in normal Eliezer-approved-decision-theory-land at all.

I think an easier question is whether you should program an AI to always keep its pre-emptive bargains with gods and demons; here the answer is just straightforwardly yes. You don't have to assume that your actions alter your algorithm, you can just alter the algorithm directly. I think this is what Eliezer is most interested in, though I'm not sure.

Load more