This was a long time ago so I don't precisely remember, but approx 4 months probably?
Thanks for writing this! That overall seems pretty reasonable, and from a marketing perspective I am much more excited about promoting "weak" longtermism than strong longtermism.
A few points of pushback:
OK, that seems like a pretty reasonable position. Thoough if we're restricting ourselves to everyday situations it feels a bit messy - naive utilitarianism implies things like lying a bunch or killing people in contrived situations, and I think the utility maximising decision is actually to be somewhat deontologist.
More importantly though, people do use utilitarianism in contexts with very large amounts of utility and small probabilities - see strong longtermism and the astronomical waste arguments. I think this is an important and action relevant thing, influencing a bunch of people in EA, and that criticising this is a meaningful critique of utilitarianism, not a weird contrived thought experiment
I'm pretty confused about the argument made by this post. Pascal's Mugging seems like a legitimately important objection to expected value based decision theory, and all of these thought experiments are basically flavours of that. This post feels like it's just imposing scorn on that idea without making an actual argument?
I think "utilitarianism says seemingly weird shit when given large utilities and tiny probabilities" is one of the most important objections.
Is your complaint that this is an isolated demand for rigor?
Note that OpenAI became a limited profit company in 2019 (2 years into this grant), which I presume made them a much less cost-effective thing to invest in, since they had much better alternative funding sources
If you're ever running an event that you are not excited to be part of, something has gone wrong
This seems way too strong to me. Eg, reasonable and effective intro talks feel like they wouldn't be much fun for me to do, yet seem likely high value
Really excited to see this post come out! I think this is a really helpful guide to people who want to work on AI Alignment, and would have been pretty useful to me in the past.
This felt like an unusually high quality post in the genre of 'stuff I buy and use', thanks for writing it! I particularly appreciate the nutrition advice, plus actual discussion of your reasoning and epistemic confidences
I'm did a pure maths undergrad and recently switched to doing mechanistic interpretability work - my day job isn't exactly doing maths, but I find it has a strong aesthetic appeal in a similar way. My job is not to train an ML model (with all the mess and frustration that involves), it's to take a model someone else has trained, and try to rigorously understand what is going on with it. I want to take some behaviour I know it's capable of and understand how it does that, and ideally try to decompile the operations it's running into something human understandable. And, fundamentally, a neural network is just a stack of matrix multiplications. So I'm trying to build tools and lenses for analysing this stack of matrices, and converting it into something understandable. Day-to-day, this looks like having ideas for experiments, writing code and running them, getting feedback and iterating, but I've found a handful of times where having good intuitions around linear algebra, or how gradients work, and spending some time working through algebra has been really useful and clarifying.
If you're interested in learning more, Zoom In is a good overview of a particular agenda for mechanistic interpretability in vision models (which I personally find super inspiring!), and my team wrote a pretty mathsy paper giving a framework to breakdown and understand small, attention-only transformers (I expect the paper to only make sense after reading an overview of autoregressive transformers like this one). If you're interested in working on this, there are currently teams at Anthropic, Redwood Research, DeepMind and Conjecture doing work along these lines!
the reason the "longtermists working on AI risk" care about the total doom in 15 years is because it could cause extinction preclude the possibility of a trillion-happy-sentient-beings in the long term. Not because it will be bad for people alive today.
As a personal example, I work on AI risk and care a lot about harm to people alive today! I can't speak for the rest of the field, but I think the argument for working on AI risk goes through if you just care about people alive today and hold beliefs which are common in the field
- see this post I wrote on the topic, and a post by Scott Alexander on the same theme.