Neel Nanda

3846 karmaJoined neelnanda.io


I lead the DeepMind mechanistic interpretability team


Should we keep making excuses for OpenAI, and Anthropic, and DeepMind, pursuing AGI at recklessly high speed, despite the fact that AI capabilities research is far out-pacing AI safety and alignment research?

I don't at all follow your jump from "OpenAI is wracked by scandals" to "other AGI labs bad" - Anthropic and GDM had nothing to do with Sam's behaviour, and Anthropic co-founders actively chose to leave OpenAI. I know you already believed this position, but it feels like you're arguing that Sam's scandals should change other people's position here. I don't see how it gives much evidence either way for how the EA community should engage with Anthropic or DeepMind?

I definitely agree that this gives meaningful evidence on whether eg 80K should still recommend working at OpenAI (or even working on alignment at OpenAI, though that's far less clear cut IMO)

Very strong +1, this is nothing like the SBF situation and there's no need for soul searching of the form "how did the EA community let this happen" in my opinion

Damage control, not defeat IMO. It's not defeat until they free previous leavers from unfair non disparagements/otherwise make it right to them

Strong +1 to this! Also, entertainingly, I know many of the people in the first episode, and they seemed significantly funnier there than they do in real life - clearly I'm not hanging out with you all in the right settings!

Idk, I do just think that bad faith actors exist, especially in the public sphere. It's a mistake to assume that all critics are in bad faith, but equally it's naive to assume that it's never bad faith

The Wenar criticism in particular seems laughably bad, such that I find bad faith hypotheses like this fairly convincing. I do agree it's a seductive line of reasoning to follow in general though, and that this can be dangerous

I got the OpenPhil grant only after the other grant went through (and wasn't thinking much about OpenPhil when I applied for the other grant). I never thought to inform the other grant maker after I got the OpenPhil grant, which maybe I should have in hindsight out of courtesy?

This was covering some salary for a fixed period of research, partially retroactive, after an FTX grant fell through. So I guess I didn't have use for more than X, in some sense (I'm always happy to be paid a higher salary! But I wouldn't have worked for a longer period of time, so I would have felt a bit weird about the situation)

Without any context on this situation, I can totally imagine worlds where this is reasonable behaviour, though perhaps poorly communicated, especially if SFF didn't know they had OpenPhil funding. I personally had a grant from OpenPhil approved for X, but in the meantime had another grantmaker give me a smaller grant for y < X, and OpenPhil agreed to instead fund me for X - y, which I thought was extremely reasonable.

In theory, you can imagine OpenPhil wanting to fund their "fair share" of a project, evenly split across all other interested grantmakers. But it seems harmful and inefficient to wait for other grantmakers to confirm or deny, so "I'll give you 100%, but lower that to 50% if another grantmaker is later willing to go in as well" seems a more efficient version.

I can also imagine that they eg think a project is good if funded up to $100K, but worse if funded up to $200K (eg that they'd try to scale too fast, as has happened with multiple AI Safety projects that I know of!). If OpenPhil funds $100K, and the counterfactual is $0, that's a good grant. But if SFF also provides $100K, that totally changes the terms, and now OpenPhil's grant is actively negative (from their perspective).

I don't know what the right social norms here are, and I can see various bad effects on the ecosystem from this behaviour in general - incentivising grantees to be dishonest about whether they have other funding, disincentivising other grantmakers from funding anything they think OpenPhil might fund, etc. I think Habryka's suggestion of funging, but not to 100% seems reasonable and probably better to me.

Omg what, this is amazing(though nested bullets not working does seem to make this notably less useful). Does it work for images?

Load more