Neel Nanda

Neel Nanda's Posts

Sorted by New

Neel Nanda's Comments

My personal cruxes for working on AI safety

Thanks for writing this up! I thought it was really interesting (and this seems a really excellent talk to be doing at student groups :) ). Especially the arguments about the economic impact of AGI, and the focus on what it costs - that's an interesting perspective I haven't heard emphasised elsewhere.

The parts I feel most unconvinced by:

  • The content in Crux 1 seems to argue that AGI will be important when it scales and becomes cheap, because of the economic impact. But the argument for the actual research being done seem more focused on AGI as a single monolithic thing, eg framings like a safety tax/arms race, comparing costs of building an unaligned AGI vs an aligned AGI.
    • My best guess for what you mean is that "If AGI goes well, for economic reasons, the world will look very different and so any future plans will be suspect. But the threat from AGI comes the first time one is made", ie that Crux 1 is an argument for prioritising AGI work over other work, but unrelated to the severity of the threat of AGI - is this correct?
  • The claim that good alignment solutions would be put to use. The fact that so many computer systems put minimal effort into security today seems a very compelling counter-argument.
    • I'm especially concerned if the problems are subtle - my impression is that especially a lot of what MIRI thinks about sounds weird and "I could maybe buy this", but could maybe not buy it. And I have much lower confidence that companies would invest heavily in security for more speculative, abstract concerns
      • This seems bad, because intuitively AI Safety research seems more counterfactually useful the more subtle the problems are - I'd expect people to solve obvious problems before deploying AGI even without AI Safety as a field.
    • Related to the first point, I have much higher confidence AGI would be safe if it's a single, large project eg a major $100 billion deployment, that people put a lot of thought into, than if it's cheap and used ubiquitously.
"EA residencies" as an outreach activity

I have low confidence in this, but I'm pretty excited about this idea! I've had many more conversations with to people super into EA over the last few months and this has definitely had a major impact on me, especially with regards to getting a better understanding of the ideas, and just making things concrete. Going from "this is some weird abstract stuff" to "these are ideas that some super awesome and smart people believe, and that I could realistically apply in my life or build my career around".

I'm somewhat biased, because I personally much prefer talking to people to eg reading things. I think a large part is just really liking the people and finding them interesting. I also got a lot of this value from going to parties and being in an EA social environment, which this wouldn't directly generalise to, but I conjecture that someone explicitly trying to create a good environment for this could do much better?

I'm wondering how much of the value of this could be captured by just having calls with people interested in EA but not at EA Hubs? This seems like it cuts out a lot of the logistical hassle of a residency, though at the cost of not being able to go to meetups, and losing out on the in-person interaction. I think it could capture much of the value of talking to someone highly into EA though.

I think that this probably works much better if the EA in residency isn’t trying to represent all of EA, they’re just trying to represent themselves, as an EA who has opinions about things, and they make it clear that they are not a representative of all of EA. If you do this, you’re less making a claim about your own legitimacy, you make it clearer that you’re not speaking for all of EA (which frees you up to share your nonstandard EA opinions), and people might jump less to the conclusion that all EAs have the same beliefs as you.

This sounds good, but really hard to pull off well. I personally found that "highly dedicated EAs who have spent a lot of time thinking about this sometimes disagree on important points" only really felt visceral to me after having several IRL conversations with smart people who held different viewpoints. And after only talking to one person, it's easy for their view and justifications to dominate, especially if they've thought about it a lot more than I have. Even if they give frequent caveats of "this is just my opinion", I don't think that feels visceral in the same way as talking to somebody really smart.

Suggested patches:

  • Actively try to be balanced in conversations, eg give steelmans for the positions you don't hold
  • Point people towards high quality write-ups of opposing viewpoints

Some further thoughts from previous discussions with Buck:

For 1 on 1 chats with people super into EA (I've had a reasonable amount of experience being on both sides of this), I think one big failure mode is not being sure what to talk about. Eg, if I'm talking to somebody who actively researches an area that interests me, there's obviously a lot of things you know a lot about that I'd find it interesting to talk about, but I struggle to come up with good questions to access those. I also expect this to be exacerbated if you're having many conversations with people already somewhat engaged with EA, as you first need to figure out their prior level of context and knowledge. This seems a difficult problem to solve, a few ideas:

  • Focusing on career conversations, where this seems less of an issue
  • Brainstorming common things people misunderstand and trying to bring those up
  • Having longer conversations and trying to ensure the person in residency is a great conversationalist (this one is much less concrete, but I think the skill of finding worthwhile things to talk about varies a lot between people)

(Being on either side of these conversations and not knowing what to talk about is a problem I frequently run into, so I'd love to hear anyone's suggestions for helping with this generally!)

Another potential failure mode is that I'd also guess there are a lot of people who might really benefit from a 1 on 1 who might feel socially awkward expressing interest or trying to arrange one, eg concern about taking up the person's time, that they're not impressive enough, general social anxiety/aversion to meeting a stranger 1 on 1, etc. Immediate thought for how to partially resolve this is asking local group organisers for introductions, as a friendlier point of contact? I think it'd also help to put a lot of thought into how to market this, for example whether people need to consider themselves high-achievers/high-potential. I think younger EAs systematically underestimate how much more experienced ones want to talk to them (at least in contexts like this, reaching out to people at EAG, etc)

The situation of "a conversation with somebody you'll probably never see again" is weird, and the way to maximise impact probably differs from how I'd normally approach a conversation, since much of the value will come from things they do on their own after the conversation without (much) further prompting. Best levers to pull are probably suggesting options they wouldn't have considered, eg career paths, or more generally challenging the narrative they're framing their life with (though this seems high variance); connecting them with useful people to speak to; Buck's argument about understanding their view of core EA arguments and addressing objections; and pointing them towards good resources they wouldn't otherwise have found