217Joined Apr 2022


Ah, sure. Justification is something like there are many people who would be great at running FHI and would want to, I'm guessing there is someone who is much better for optics and would make people more comfortable than Bostrom. Replacing him with one of these just-as-capable people seems to have few downsides and several upsides. 

Hm, fairly confused by the downvotes. I'm guessing a) people disagree with this being a good decision or b) there's a really obvious answer? If it's b), can you please tell me?

This makes more sense. I still feel a bit  irked by the downvotes though - I would like people to be aware of the email, and feel much more strongly about this than about not wanting people to see some of pseudonym's takes about the apology.

While I agree that these kinds of "bad EA optics" posts are generally unproductive and it makes sense for them to get downvoted, I'm surprised that this specific one isn't getting more upvoted? Unlike most links to hit pieces and criticisms of EA, this post actually contains new information that has changed my perception of EA and EA leadership. 

with less intensity, we should discourage the framing of 'auditing' very established journalists for red flags

 Why? If I was making a decision to be interviewed by Rachel or not, probably the top thing I'd be worried about is whether they've previously written not-very-journalistic hit pieces on tech-y people (which is not all critical pieces in general! some are pretty good and well researched). I agree that there's such thing as going too far, but I don't think my comment was doing that.  

I think "there are situations this is valid (but not for the WSJ!)" is wrong? There have been tons of examples of kind of crap articles in usually highly credible newspapers.  For example, this article in the NYT seemed to be pretty wrong and not that good

I think it makes more sense to look at articles that Rachel has written about SBF/EA. Here's one:


I (very briefly) skimmed it and didn't see any major red flags. 

Not an answer, but why are you trying to do this? If you're excited about Biology, there seem to be plenty of ways to do impactful biological work. 

Even if you're purely trying to maximize your impact, for areas like AI Alignment, climate change, or bioweapons, the relevant question is something like: what is the probability that me working on this area prevents a catastrophic event? According to utilitarianism, your # of lives saved is basically this time the total number of people that will ever live or something like this. 

So if there's a 10% chance of AI killing anyone, and you working on this brings it down to 9.999999%, this is less impactful than if there's a 0.5 % chance of climate change killing everyone, and you working on this brings it down to 0.499 %. Since it's much more likely to do impactful work in an area that excites you, seems like bio is solid, since it's relevant to bioweapons and climate change?

Ok, cool, that's helpful to know. Is your intuition that these examples will definitely occur and we just haven't seen them yet (due to model size or something like this)? If so, why?

Right, so I'm pretty on board with optimal policies (i.E., "global maximum" policies) usually involve seeking power. However, gradient descent only finds local maximums, not global maximums. It's unclear to me whether these global maximums would involve something like power-seeking. My intuition for why this might not be the case is that "small tweaks" in the direction of power-seeking would probably not reap immediate benefits, so gradient descent wouldn't go down this path. 

This is where my question kind of arose from. If you have empirical examples of power-seeking coming up in tasks where it's nontrivial that it would come up, I'd find that particularly helpful.

Does the paper you sent address this? If so, I'll spend more time reading it.  

Thanks! I think most of this made sense to me. I'm a bit fuzzy on the fourth bullet. Also, I'm still confused why a model would even develop an alternative goal to maximizing its reward function, even if it's theoretically able to pursue one. 

Load More