Benjamin Hilton

Research Analyst @ 80,000 Hours


Benjamin is a research analyst at 80,000 Hours. Before joining 80,000 Hours, he worked for the UK Government and did some economics and physics research.


I agree with (a). I disagree that (b) is true! And as a result I disagree that existing CEAs give you an accurate signpost.

Why is (b) untrue? Well, we do have some information about the future, so it seems extremely unlikely that you won't be able to have any indication as to the sign of your actions, if you do (a) reasonably well.

Again, I don't purely mean this from an extreme longtermist perspective (although I would certainly be interested in longtermist analyses given my personal ethics). For example, simply thinking about population changes in the above report  would be one way to move in this direction. Other possibilities include thinking about the effects of GHW interventions on long-term trajectories, like growth in developing countries (and that these effects may dominate short-term effects like DALYs averted for the very best interventions). I haven't thought much about what other things you'd want to measure to make these estimates, but I would love to see someone try, and it seems pretty crucial if you're going to be doing accurate CEAs.

Sure, happy to chat about this!

Roughly I think that you are currently not really calculating cost-effectiveness. That is, whether you're giving out malaria nets or preventing nuclear war, almost all of the effects of your actions will be affecting people in the future.

To clarify, by "future" I don't necessarily mean "long run future". Where you put that bar is a fascinating question. But focusing on current lives lost seems to approximately ignore most of the (positive or negative) value, so I expect your estimates to not be capturing much about what matters.

(You've probably seen this talk by Greaves, but flagging it in case you haven't! Sam isn't a huge fan, I think in part because Greaves reinvents a bunch of stuff that non-philosophers have already thought a bunch about, but I think it's a good intro to the problem overall anyway.)

I'm curious about the ethical decisions you've made in this report. What's your justification for evaluating current lives lost? I'd be far more interested in cause-X research that considers a variety of worldviews, e.g. a number of different ways of evaluating the medium or long-term consequences of interventions.

I agree that I'd love to see more work on this! (And I agree that the last story I talk about, of a very fast takeoff AI system with particularly advanced capabilities, seems unlikely to me - although others disagree, and think this "worst case" is also the most likely outcome.)

It's worth noting again though that any particular story is unlikely to be correct. We're trying to forecast the future, and good ways of forecasting should feel uncertain at the end, because we don't know what the future will hold. Also, good work on this will (in my opinion) give us ideas about what many possible scenarios will look like . This sort of work (e.g. the first half of this article, rather than the second), often feels less concrete,  but is, I think, more likely to be correct - and can inform actions that target many possible scenarios rather than one single unlikely event.

All that said, I'm excited to see work like OpenPhil's nearcasting project which I find particularly clarifying and which will, I hope, improve our ability to prevent a catastrophe.

That particular story, in which I write "one day, every single person in the world suddenly dies", is about a fast takeoff self-improvement scenario. In such scenarios, a sudden takeover is exactly what we should expect to occur, and the intermediate steps set out by Holden and others don't apply to such scenarios. Any guessing about what sort of advanced technology would do this necessarily makes the scenario less likely, and I think such guesses (e.g. "hypnodrones") are extremely likely to be false and aren't useful or informative.

For what it's worth, I  personally agree that slow takeoff scenarios like those described by Holden (or indeed those I discuss in the rest of this article) are far more likely.  That's why I focus many different ways in which an AI could take over - rather than on any particular failure story. And, as I discuss, any particular combination of steps is necessarily less likely than the claim that any or all of these capabilities could be used.

But a significant fraction of people working on AI existential safety disagree with both of us, and think that a story which literally claims that a sufficiently advanced system will suddenly kill all humans is the most likely way for this catastrophe to play out! That's why I also included a story which doesn't explain these intermediate steps, even though my inside view is that this is less likely to occur.

Yeah, it’s a good question! Some thoughts:

  • I’m being quite strict with my definitions. I’m only counting people working directly on AI safety. So, for example, I wouldn’t count the time I spent writing this profile on AI (or anyone else who works at 80k for that matter). (Note: I do think lots of relevant work is done by people who don’t directly work on it) I’m also not counting people who think of themselves as on an AI safety career path and are, at the moment, skilling up rather than working directly on the problem. There are some ambiguities, e.g. are the ops team of an AI org working on safety? In general though these ambiguities seem much lower than the error in the data itself.

  • AI safety is hugely neglected outside EA (which is a key reason why it seems so useful to work on). This isn't a big surprise and may be in large part a result of the fact that it used to be even more neglected, which means that anything that is started as an AI safety org is likely to have been started by EAs, so is also seen as an EA org. Which makes AI safety a subset of EA rather than the other way round.

  • Also, I'm looking at AI existential safety rather than broader AI ethics or AI safety issues. The focus on x-risk (combined with reasons to think that lots of work on AI non-existential safety isn't that relevant - as compared with e.g. bio where lots of policy work for example is relevant to major pandemics and existential pandemics) makes it even more likely that this is just looking at a strict subset of EAs

  • There are I think up to around 10 thousand engaged EAs - of those maybe 1-2 thousand are longtermism or x-risk focused. So we're looking at 10% of these people working full-time on AI x-risk! Seems like a pretty high proportion to me given the various causes in the wider EA (not even longtermist) community.

  • So in many ways the question of "how are so few people working on AI safety after 10 years" is similar to "how are there so few EAs after 10 years", which is a pretty complicated question. But it seems to me like EA is way way way bigger and more influential than I would ever have expected in 2012!

  • There are also some other bottlenecks (notably mentoring capacity). The field was nearly non-existent 10 years ago, with very few senior people to help others enter the field – and it’s (rightly) a very technical field, focused on theoretical and practical computer science / ML. Even now, the proportion of time those 300 people should be spending mentoring is very much unclear to me.

I'd also like to highlight the footnote alongside this number: "There’s a lot of subjective judgement in the estimate (e.g. “does it seem like this research agenda is about AI safety in particular?”), and it could be too low if AI Watch is missing data on some organisations, or too high if the data counts people more than once or includes people who no longer work in the area. My 90% confidence interval would range from around 100 people to around 1,500 people."

Hi Gideon,

I wrote the 80,000 Hours problem profile on climate change. Thank you so much for this feedback! I’m genuinely really grateful to see such engagement with the things I write - and criticism is always a welcome contribution to making sure that I’m saying the right things.

Just to be clear, when I said “we think it’s potentially harmful to do work that could advance solar geoengineering”, I meant that (with a fair degree of uncertainty), it could be harmful to do work that advances the technology (which I think you agree with) not that all research around the topic seems bad! It definitely seems plausible that some research on the topic might be good - but I was trying to recommend the very best things to do to mitigate climate change. My reviewers pretty much all agreed that, partly as a result of potential harmful effects, it doesn’t seem like SRM research would be one of those very best things, and so suggested that we stop recommending working in the area. In large part I’m deferring to this consensus among the reviewers on this.

Hope that helps!


I think these are all great points! We should definitely worry about negative effects of work intended to do good. 

That said here are two other places where maybe we have differing intuitions:

  • You seem much more confident than I am that work on AI that is unrelated to AI safety is in fact negative in sign. 
  • It seems hard to conclude that the counterfactual where any one or more of "no work on AI safety / no interpretability work / no robustness work / no forecasting work" were true is in fact a world with less x-risk from AI overall. That is, while I can see there are potential negative effects of these things, when I truly try to imagine the counterfactual, the overall impact seems likely positive to me.

Of course, intuitions like these are much less concrete than actually trying to evaluate the claims , and I agree it seems extremely important for people evaluating or doing anything in AI safety to ensure they're doing positive work overall.

Load More