Neel Nanda

1909Joined Nov 2019


This was a long time ago so I don't precisely remember, but approx 4 months probably?

Thanks for writing this! That overall seems pretty reasonable, and from a marketing perspective I am much more excited about promoting "weak" longtermism than strong longtermism.

A few points of pushback:

  • I think that to work on AI Risk, you need to buy into AI Risk arguments. I'm unconvinced that buying longtermism first really shifts the difficulty of figuring this point out. And I think that if you buy AI Risk, longtermism isn't really that cruxy. So if our goal is to get people working on AI Risk, marketing longtermism first is strictly harder (even if it may be much easier)
    • I think that very few people say "I buy the standard AI X-Risk arguments and that this is a pressing thing, but I don't care about future people so I'm going to rationally work on a more pressing problem" - if someone genuinely goes through that reasoning then more power to them!
    • I also expect that people have done much more message testing + refinement on longtermism than AI Risk, and that good framings could do much better - I basically buy the claim that it's a harder sell though
    • Caveat: This reasoning applies more to "can we get people working on AI X-Risk with their careers" more so than things like broad societal value shifting
    • Caveat: Plausibly there's enough social proof that people who care about longtermism start hanging out with EAs and are exposed to a lot of AI Safety memes and get there eventually? And it's a good gateway thing? 
  • I want AI Risk to be a broad tent where people who don't buy longtermism feel welcome. I'm concerned about a mood affiliation problem where people who don't buy longtermism but hear it phrased it as an abstract philosophical problem that requires you to care about the 10^30 future people won't want to work on it, even though they buy the object level. This kind of thing shouldn't hinge on your conclusions in contentious questions in moral philosophy!
  • More speculatively: It's much less clear to me that pushing on things like general awareness of longtermism or longterm value change matter in a world with <20 year AI Timelines? I expect the world to get super weird after that, where more diffuse forms of longtermism don't matter much. Are you arguing that this kind of value change over the next 20 years makes it more likely that the correct values are loaded into the AGI, and that's how it affects the future? 

OK, that seems like a pretty reasonable position. Thoough if we're restricting ourselves to everyday situations it feels a bit messy - naive utilitarianism implies things like lying a bunch or killing people in contrived situations, and I think the utility maximising decision is actually to be somewhat deontologist.

More importantly though, people do use utilitarianism in contexts with very large amounts of utility and small probabilities - see strong longtermism and the astronomical waste arguments. I think this is an important and action relevant thing, influencing a bunch of people in EA, and that criticising this is a meaningful critique of utilitarianism, not a weird contrived thought experiment

I'm pretty confused about the argument made by this post. Pascal's Mugging seems like a legitimately important objection to expected value based decision theory, and all of these thought experiments are basically flavours of that. This post feels like it's just imposing scorn on that idea without making an actual argument? 

I think "utilitarianism says seemingly weird shit when given large utilities and tiny probabilities" is one of the most important objections. 

Is your complaint that this is an isolated demand for rigor? 

Note that OpenAI became a limited profit company in 2019 (2 years into this grant), which I presume made them a much less cost-effective thing to invest in, since they had much better alternative funding sources

If you're ever running an event that you are not excited to be part of, something has gone wrong

This seems way too strong to me. Eg, reasonable and effective intro talks feel like they wouldn't be much fun for me to do, yet seem likely high value

Really excited to see this post come out! I think this is a really helpful guide to people who want to work on AI Alignment, and would have been pretty useful to me in the past. 

This felt like an unusually high quality post in the genre of 'stuff I buy and use', thanks for writing it! I particularly appreciate the nutrition advice, plus actual discussion of your reasoning and epistemic confidences

I'm did a pure maths undergrad and recently switched to doing mechanistic interpretability work - my day job isn't exactly doing maths, but I find it has a strong aesthetic appeal in a similar way. My job is not to train an ML model (with all the mess and frustration that involves), it's to take a model someone else has trained, and try to rigorously understand what is going on with it. I want to take some behaviour I know it's capable of and understand how it does that, and ideally try to decompile the operations it's running into something human understandable. And, fundamentally, a neural network is just a stack of matrix multiplications. So I'm trying to build tools and lenses for analysing this stack of matrices, and converting it into something understandable. Day-to-day, this looks like having ideas for experiments, writing code and running them, getting feedback and iterating, but I've found a handful of times where having good intuitions around linear algebra, or how gradients work, and spending some time working through algebra has been really useful and clarifying. 

If you're interested in learning more, Zoom In is a good overview of a particular agenda for mechanistic interpretability in vision models (which I personally find super inspiring!), and my team wrote a pretty mathsy paper giving a framework to breakdown and understand small, attention-only transformers (I expect the paper to only make sense after reading an overview of autoregressive transformers like this one). If you're interested in working on this, there are currently teams at Anthropic, Redwood Research, DeepMind and Conjecture doing work along these lines!

the reason the "longtermists working on AI risk" care about the total doom in 15 years is because it could cause extinction preclude the possibility of a trillion-happy-sentient-beings in the long term. Not because it will be bad for people alive today.

As a personal example, I work on AI risk and care a lot about harm to people alive today! I can't speak for the rest of the field, but I think the argument for working on AI risk goes through if you just care about people alive today and hold beliefs which are common in the field

 - see this post I wrote on the topic, and a post by Scott Alexander on the same theme.

Load More