I’ve been interested in AI risk for a while and my confidence in its seriousness has increased over time, but I’ve generally harbored some hesitation about believing some combination of short-ish AI timelines[1] and high risk levels[2]. In this post I’ll introspect on what comes out when I try to expand on reasons for this hesitation and categorize the reasons into seeming (likely) unjustified vs. potentially justified.

I use justified as “should affect my credence in AI risk levels, timelines, etc.” and unjustified as the opposite. These categorizations are very tentative: I could imagine myself changing my mind about several considerations.

I also describe my current overall attitude toward the importance of AI risk given these considerations.

Unjustified

Contrarian-within-EA instincts

I have somewhat contrarian instincts and enjoy debating, playing devil’s advocate, etc. It feels boring to agree with the 80,000 Hours ranking of AI risk as the most important problem; it would feel more fun to come up with a contrarian take and try to flesh out the arguments and get people on board. But this doesn’t mean that the contrarian take is right; in fact, given my beliefs about how talented EAs are I should expect the current take to be more likely than the contrarian one before looking into it.

Desire for kids

I’ve always enjoyed spending time with kids and as such have likely wanted to have kids for as long as I can remember. It’s hard for me to grapple with the idea that my kids’ most likely reason to die young would be AI risk, and perhaps not even close. I’ve become more hesitant about my desire to have kids due to a high perceived risk level and also potential reduced productivity effects during a very important period; I’d want to be able to spend a lot of time with my kids and not treat them as a second priority to my work. This has been tough to swallow.

Uncomfortable about implications for EA

I got into EA via Doing Good Better and was originally excited about the opportunity to clearly save many lives throughout my career. I went vegan due to animal welfare concerns and still feel a lot of intuitive sympathy for the huge amounts of suffering many humans and animals are currently going through. It feels a bit sad to me that as my beliefs have been evolving it’s been hard to deny that there’s a decent chance that AI safety and things that feed into it (e.g. movement building, rationality/epistemics improvement, grantmaking, etc.) have a much higher EV than other activities all else equal.

I might feel more at peace if my beliefs implied higher levels of variance in what the most impactful activities were. Having a relatively diverse and inclusive movement feels important and more fun than one where the most talented people are mostly funneled into the same few activities. Doubly so compared to a focus on AI that feels weird to many people and could be badly mistaken given our level of understanding. And I’d still be reluctant to encourage people who feel very passionate about what they do and are doing useful things to switch to working on AI safety.

But it might just be a fact about the world that AI safety is by a substantial amount the most important cause area, and this is actually consistent with original EA arguments about unexpectedly high differences in impact between cause areas. And how I feel about this fact shouldn’t influence whether I believe it’s true.

Feelings about AI risk figures

I admire Eliezer in a lot of ways but I find it hard to get through his writing given his drawn out style, and he seems overly bombastic to me at times. I haven’t read the sequences though I might at some point. I haven’t gotten past Chapter 1 of HPMOR and probably don’t intend to. And his beliefs about animal suffering seem pretty crazy to me. But my feelings about Eliezer don’t affect how strong the object-level arguments are for AI posing an existential risk.

Worries about bias towards AI and lack of AI expertise

I was pretty interested in machine learning before I found out about EA. This made me suspicious when I started to seriously believe that AI risk was the most important cause area; wasn’t this a bit too fishy? I projected these worries onto others as well, like: isn’t it a coincidence that people who love math concluded that the best way to save the world is by thinking about fun math stuff all day?

Reflection has made me less concerned about this because I realized[3] that I have opposing hesitations depending on if the person worried about AI risk was an AI expert or not. If they were an AI expert, I have the worry described above that the conclusion was too convenient. But for people worried about AI who aren’t AI experts, I had the worry that they didn’t know enough to be worried! So either way I was coming up with a justification to be hesitant. See also Caution on Bias Arguments.

I think there would be more reason for concern if those concerned about AI risk were overwhelmingly either AI experts or AI novices, but in fact it seems like a healthy mix to me (e.g. Stuart Russell is an expert, most of the 80,000 hours team are novices). Given this and my opposing intuitions depending on the advocator, I think these reasons for hesitancy aren’t much of a concern.

EDIT: As pointed out in this comment, it's possible that both experts and novices are biased towards AI because they find it cool/fun.

Doomsaying can’t be vindicated

I’m a competitive guy, and I really like the feeling of being right/vindicated (“I told you so!”). I don’t like the opposite feeling of losing, being wrong and embarrassed, etc. And to a first approximation, doomsaying can’t be vindicated, it can only be embarrassed! In this way I admire MIRI for sticking their neck out with relatively short timelines and high p(doom) with a fast takeoff. They only have the potential to be embarrassed; if they’re right we’ll likely all drop dead with approximately no time for “I told you so!”.

Potentially justified

We have no idea what we’re doing

While “no idea” is a hyperbole, recent discussions such as the MIRI conversations have highlighted deep disagreements about the trajectory of AI and which approaches seem promising as a result. Predicting the future seems really hard, and technological predictions are often too aggressive[4]. It’s likely we’ll look back on work we’re doing 20 years from now and think it was very misguided, similar to how we might look at lots of work 20 years ago.[5] But note that this unpredictability can cut both ways; it might be hard to rule out short timelines and some past technological predictions may have been too conservative.

Note that this could also potentially point toward “figuring out what we’re doing” rather than deprioritizing AI risk, depending on views on just how hard it is to figure out what we’re doing. This is basically my current take though I think “trying to actually do stuff” should be a large part of the portfolio of figuring things out.

Many smart people disagree

But have they engaged with the arguments? points out in the context of AI risk:

The upshot here seems to be that when a lot of people disagree with the experts on some issue, one should often give a lot of weight to the popular disagreement, even when one is among the experts and the people's objections sound insane. Epistemic humility can demand more than deference in the face of peer disagreement: it can demand deference in the face of disagreement from one's epistemic inferiors, as long as they're numerous. They haven't engaged with the arguments, but there is information to be extracted from the very fact that they haven't bothered engaging with them.

I think this is a legitimate concern and enjoy efforts to seek out and flesh out opinions of generally reasonable people and/or AI experts who think AI risk is misguided. This may be a case where steelmanning is particularly useful. Recent efforts in this direction include Transcripts of interviews with AI researchers and Why EAs are Skeptical about AI Safety.

But I think at a certain point you need to take a stand, and overly modest epistemology has its downsides. I also have the intuition that oftentimes if you want to have a big impact, at some point you have to be willing to follow arguments you believe in even if they’re disputed by many reasonable people. You have to accept the possibility you might be badly mistaken and make the bet.

Expected value of the future

This is a concern with a brand of longtermism in general rather than AI specifically, and note that it might push toward working on AI from more of a suffering-focused perspective (or even mostly doing standard AI risk stuff depending on how much overlap there is) rather than deprioritizing AI stuff.

But I do have some unresolved uncertainties about the expected value of the future; it seems fairly unclear to me though still positive if I had to guess. I’m planning on spending more time thinking about this at some point but for now will just link to some relevant posts here, here, and here. Also related is Holden’s suggestion to explore how we should value long-run outcomes relative to each other.

Bias toward religion-like stories

My concerns are broadly similar to the ones described in this post: it seems like concerns about AI risk follow similar patterns to some religions/cults: AI is coming soon and we’ll probably either enter a ~utopia or all die within our lifetimes, depending on what actions we take.

I don’t think we should update too much on this (the replies to the post above are worth reading and generally convincing imo) but it seems useful to keep in mind. Lots of very impactful groups (e.g. startups) also have some features of cults/religions, so again I feel at some point one has to take a stand on the object-level issues based on their best guess.

Track record of AI risk figures

There are at least some data points of Eliezer being overconfident in the past about technological timelines, which should maybe cause us to downweight his specific assessments a little. Though he has also been fairly right on the general shape of the problem and way ahead of everyone else, so we also need to take that into account.

Not sure I have much more to add here besides linking this more comprehensive post and comment section.

Overall attitude

My best guess is the high-level argument of the form “We could in principle create AI more intelligent than us, it seems fairly likely it will happen this century, and creating agents more intelligent us would be a really big deal and could lead to very good or bad outcomes” similar to the one described here is basically right and alone implies that AI is an extremely important technology to pay attention to. This plus instrumental convergence plus the orthogonality thesis seem sufficient to make AI the biggest existential risk we know of by a substantial margin.

Over time I’ve become more confident that some of my hesitations are basically unjustified and the others seem more like points for further research than reasons to not treat AI risk as the most important problem. I’d be excited for further discussion and research on some of the potentially justified hesitations, in particular: improving and clarifying our epistemic state, seeking out and better understanding opinions of reasonable people who disagree, and the expected value of the future.

Acknowledgments

Thanks to Miranda Zhang for feedback and discussion. Messy personal stuff that affected my cause prioritization (or: how I started to care about AI safety) vaguely inspired me to write this.


  1. Something like, >50% of AGI/TAI/APS-AI within 30 years ↩︎

  2. Say, >15% chance of existential catastrophe this century ↩︎

  3. I forget if I actually realized this myself or I first saw someone else make this point, maybe Rob Wiblin on Twitter? ↩︎

  4. Examples: UK experts were overly optimistic, as were cultured meat predictions and (weakly) Metaculus AI predictions ↩︎

  5. I’m actually a bit confused about this though; I wonder how useful MIRI considers its work from 20 years ago to be? ↩︎

168

11 comments, sorted by Click to highlight new comments since: Today at 2:06 PM
New Comment

On the "Worries about bias towards AI and lack of AI expertise" section, can't you also make the argument that everyone finds AI cool, experts and novices alike?

AI novices also find AI cool, and finally, there is a way for them to get into an AI career, associate with a cool community full of funding opportunities even for novices.

I'm surprised by your reason for being skeptical about AI novices on the grounds that they don't know enough to be worried. Take a "novice" who has read all the x-risk books, forum posts and podcasts vs an  AI expert who's worked on ML for 15 years. It's possible that they know the same amount about AI X-risk mitigation, and would perhaps have similar success rate working on some alignment research (which to a great deal involves GPT-3 prompt hacking with near-0 maths).

What's more, a AI novice might be better off than an AI expert. They might find it easier to navigate the funding landscape, have more time/smaller opportunity cost to go to all the EA events, are less likely to critically argue all the time, and thus may have better opportunities to get involved in grantmaking or get maybe smaller grants themselves. Imagine that two groups wanted to organise an AI camp or event: a group of AI novice undergrads who have been engaged in EA vs a group of AI profs with no EA connections. Who is more likely to get funding?

EA-funded AI safety is actually a pretty sweet deal for an AI novice who gets to do something that's cool at very little cost.

Consequently, it's possible to be skeptical of the motivations anyone in AI safety, expert or novice, on the grounds that "isn't it convenient the best way to save the world is to do cool AI stuff?"


 

Consequently, it's possible to be skeptical of the motivations anyone in AI safety, expert or novice, on the grounds that "isn't it convenient the best way to save the world is to do cool AI stuff?"

 

Fair point overall, and I'll edit in a link to this comment in the post. It would be interesting to see data on what percentage of people working AI safety due to EA motivations would likely be working in AI regardless of impact. I'd predict that it's significant but not a large majority (say, 80% CI of 25-65%).

A few reactions to specific points/claims:

It's possible that they know the same amount about AI X-risk mitigation, and would perhaps have similar success rate working on some alignment research (which to a great deal involves GPT-3 prompt hacking with near-0 maths).

My understanding is that most alignment research involves either maths or skills similar to ML research/engineering; there is some ~GPT-3 prompt hacking (e.g. this post?) but it seems like <10% of the field?

Imagine that two groups wanted to organise an AI camp or event: a group of AI novice undergrads who have been engaged in EA vs a group of AI profs with no EA connections. Who is more likely to get funding?

I'm not sure about specifically organizing an event, but I'd guess that experienced AI profs with no EA connections but who seemed genuinely interested in reducing AI x-risk would be able to get substantial funding/support for their research.

EA-funded AI safety is actually a pretty sweet deal for an AI novice who gets to do something that's cool at very little cost.

The field has probably gotten easier to break into over time but I'd guess most people attempting to enter still experience substantial costs, such as rejections and mental health struggles.

I found this unusually moving and comprehensive; thank you.

I really appreciate you writing this. Getting clear on one's own reasoning about AI seems really valuable, but for many people, myself included, it's too daunting to actually do. 

If you think it's relevant to your overall point, I would suggest moving the first two footnotes (clarifying what you mean by short timelines and high risk) into the main text. Short timelines sometimes means <10 years and high risk sometimes means >95%

I think you're expressing your attitude to the general cluster of EA/rationalist views around AI risk typified by eg. Holden and Ajeya's views (and maybe Paul Christiano's, I don't know) rather than a subset of those views typified by eg. Eliezer (and maybe other MIRI people and Daniel Kokotajlo, I don't know).  To me, the main text implies you're thinking about the second kind of view, but the footnotes are about the first. 

And different arguments in the post apply more strongly to different views. Eg

  • Fewer 'smart people disagree' about the numbers in your footnote than about the more extreme view. 
  • I'm not sure Eliezer having occasionally been overconfident, but got the general shape of things right is any evidence at all against >50% AGI in 30 years or >15% chance of catastrophe this century (though it could be evidence against Eliezer's very high risk view).
  • The Carlsmith post you say you roughly endorse seems to have 65% on AGI in 50 years, with a 10% chance of existential catastophe overall. So I'm not sure if that means your conclusion is 
    • 'I agree with this view I've been critically examining'  
    •  'I'm still skeptical of 30 year timelines with >15% risk, but I roughly endorse 50 year timelines with 10% risk'
    • 'I'm skeptical of 10 year timelines with >50% risk, but I roughly endorse 30-50 year timelines with 5-20% risk'
    • Or something else 

Thanks for pointing this out. I agree that I wasn't clear about this in the post.

My hesitations have been around adopting views with timelines and risk level that are at least as concerning as the OpenPhil cluster (Holden, Ajeya, etc.) that you're pointing at; essentially views that seem to imply that AI and things that feed into it are clearly the most important cause area.

I'm not sure Eliezer having occasionally been overconfident, but got the general shape of things right is any evidence at all against >50% AGI in 30 years or >15% chance of catastrophe this century (though it could be evidence against Eliezer's very high risk view).

I wouldn't go as far as no evidence at all given that my understanding is Eliezer (+ MIRI) was heavily involved in influencing the OpenPhil's cluster's views so it's not entirely independent, but I agree it's much weaker evidence for less extreme views.

Fewer 'smart people disagree' about the numbers in your footnote than about the more extreme view.'

I was going to say that it seems like a big difference within our community, but both clusters of views are very far away from the median pretty reasonable person and the median AI researcher. Though I suppose the latter actually isn't far away on timelines (potentially depending on the framing?). It definitely seems to be in significant tension with how AI researchers and the general public / markets / etc. act, regardless of stated beliefs (e.g. I found it interesting how short the American public's timelines are, compared to their actions). 

Anyway, overall I think you're right that it makes a difference but it seems like a substantive concern for both clusters of views.

The Carlsmith post you say you roughly endorse seems to have 65% on AGI in 50 years, with a 10% chance of existential catastophe overall. So I'm not sure if that means your conclusion is 

[...]

The conclusion I intend to convey is something like "I'm no longer as hesitant about adopting views which are at least as concerning as >50% of AGI/TAI/APS-AI within 30 years, and >15% chance of existential catastrophe this century" which as I referred to above seem to make AI clearly the most important cause area.

Copying my current state on the object level views from another recent post:

I’m now at ~20% by 2036; my median is now ~2050 though still with a fat right tail.


My timelines shortening [due to reflecting on MATH breakthrough] should also increase my p(AI doom by 2100) a bit, though I’m still working out my views here. I’m guessing I’ll land somewhere between 20 and 60% [TBC, most of the variance is coming from working out my views and not the MATH breakthrough].

I beg people to think for themselves on this issue instead of making their decision about what to believe mainly on the basis of deference and bias-correction heuristics. Yes, you can't think for yourself about every issue, there just isn't enough time in the day. But you should cultivate the habit of thinking for yourself about some issues at least, and I say this should be one of them. 

Could you elaborate on the expected value of the future point? Specifically, it's unclear to me how it should affect your credence of AI risk or AI timelines.

Yeah, the idea is that the lower the expected value of the future the less bad it is if AI causes existential catastrophes that don't involve lots of suffering. So my wording was sloppy here; lower EV of the future perhaps decreases the importance of (existential catastrophe-preventing) AI risk but not my credence in it.

Understood, thanks!

I think you are somewhat overestimating current Minerva capabilities. MATH dataset is not so hard; the person who got 40% "did not like mathematics" and had 1 hour per 20 questions.

Generally, I agree with more or less everything in the essay, just wanted to nitpick on this particular thing.

I agree, though I suspect that one was simply designed badly.