Comment Permalink

I resonate a lot with this post. Thank you for writing it and giving an opportunity to people like me to express their thoughts on the topic. I'm writing with an anonymous account because publicly stating things like, 'I'm not sure it would be bad for humanity to be destroyed' seems dangerous for my professional reputation. I don't like not being transparent, but the risks here seem too great.

I currently work in an organization dedicated to reducing animal suffering. I've recently wondered a lot if I should go work on reducing x-risks from AI: it seems there's work where I could potentially be counterfactually useful in AI safety. But after having had about a dozen discussions with people from AI Safety field, I still don't have this gut feeling that reducing x-risks from AI is something that deserves my energy more than reducing animal suffering in the short-term.

I am not at all an expert on issues around AI, so take what follows as 'the viewpoint of someone outside the world of AI safety / x-risks trying to form an opinion on these issues, with the constraint of having a limited amount of time to do so'

The reasons are:

I am primarily a hedonistic utilitarian. For me to have a gut feeling that an action is worth taking, I need to be convinced that its expected value is high in this moral theory.
It does not seem clear to me that reducing x-risks from AI leads to scenarios with a better expected value than the default scenario.
- I have the impression that the default scenario is an AI not aligned with what humanity wants, and that perhaps destroys humanity or destabilizes human societies greatly.
- The scenario towards which work on AI safety/alignment leads us seems to be a scenario where AI is aligned with the values of humanity. I don't think humanity is actively seeking to make sentient beings suffer, but I think humanity shows very little consideration for the suffering of animals and even much of humanity itself. So I also guess that humanity wouldn't care a lot about digital minds suffering. So given the current values of humanity, it is not at all clear to me that this scenario is better than the scenario where AI 'destroys everything'.
- The best possible scenario is the one where AI is aligned with 'good values' (maybe something that starts by being rather agnostic about what is good, but still with the idea that sentience is a very promising criterion, and which seeks out what is good. Once it is sufficiently sure that it knows what is good, it begins to optimize for it). I think that work on AI safety increases the probabilities that this scenario will occur, but that there is a risk that it mainly increases the probabilities that the previous scenario occurs. So, I am afraid that people in AI safety maximize the probability of the best possible scenario occurring, but do not maximize the overall expected value.
- So, to me, the main question for knowing if work on AI safety is net positive is whether the intermediary scenario where AI is aligned with current humanity values, but not with 'Great values', is better than the scenario where AI is not aligned at all
- For example, I like the discussion proposed here by the @Center on Long-Term Risk: Would a human-inspired AI or rogue AI cause more suffering?
The people working in AI safety with whom I discuss do not seem to have thought enough for my taste about what is above. It seems they place more weight than I do on the intrisical bad value of 'humanity goes extinct because of AI.' This makes me feel like I can't defer much to them in my decision-making.
More globally, I have a feeling that working on AI safety is more about "Ensuring the future is big" rather than "Ensuring the future is good"

Ultimately, the two questions I would like to answer are:

Is working on conventional AI Safety and / or Governance net positive in expectation if you mainly endorse hedonist utilitarianism?
If not, is there an other kind of work on AI that is robustly positive if you mainly endorse hedonist utilitarianism? Does it seem more cost-effective in this moral theory than "conventional" work for reducing animal suffering?

Vasco Grilo🔸Jul 1 20243

Interesting points!

So, to me, the main question for knowing if work on AI safety is net positive is whether the intermediary scenario where AI is aligned with current humanity values, but not with 'Great values', is better than the scenario where AI is not aligned at all

AI not being aligned at all is not exactly a live option? The pre-training relies on lots of human data, so it alone leads to some alignment with humanity. Then I would say that current frontier models post-alignment already have better values than a random human, so I assume alignment tech... (read more)

See in context

In favour of exploring nagging doubts about x-risk

by Owen Cotton-Barratt

Jun 25 20242 min read 15

89

AI safetyExistential riskCause prioritizationEpistemologyAI risk skepticismBiorisk skepticismCommunity epistemic health

Frontpage

I was chatting recently to someone who had difficulty knowing how to orient to x-risk work (despite being a respected professional working in that field). They expressed that they didn't find it motivating at a gut level in the same way they did with poverty or animal stuff; and relatedly that they felt some disconnect between the arguments that they intellectually believed, and what they felt to be important.

I think that existential security (or something like it) should be one of the top priorities of our time. But I usually feel good about people in this situation paying attention to their gut scepticism or nagging doubts about x-risk (or about parts of particular narratives about x-risk, or whatever). I’d encourage them to spend time trying to name their fears, and see what they think then. And I’d encourage them to talk about these things with other people, or write about the complexities of their thinking.

Partly this is because I don't expect people who are using intellectual arguments to override their gut to do a good job of consistently tracking what the most important things to do are on a micro-scale. So it would be good to get the different parts of them to sync more.

And partly because it seems like it would be a public good to explore and write about these things. Either their gut is onto something with parts of its scepticism, in which case it would be great to have that articulated; or their gut is wrong, but if other people have similar gut reactions then playing out that internal dialogue in public could be pretty helpful.

It's a bit funny to make this point about x-risk in particular because of course the above all applies to whatever topic. But I think people normally grasp it intuitively, and somehow that's less universal around x-risk. I guess maybe this is because people don't have any first-hand experience with x-risk, so their introductions to it are all via explicit arguments … and it's true that it's a domain where we should be unusually unwilling to trust our gut takes without hearing the arguments, but it seems to me like people are unusually likely to forget that they can know anything which has bearing on the questions without already being explicit (and also that perhaps the social environment, in encouraging people to take explicit arguments seriously, can accidentally overstep and end up discouraging people from taking anything else seriously). These dynamics seem especially strong in the case of AI risk — which I regard as the most serious source of x-risk, but also the one where I most wish people spent more time exploring their nagging doubts.

89 Reactions

Comments15

Sorted by

New & upvoted

Click to highlight new comments since: Today at 6:13 PM

finmJun 26 2024130

In this spirit, here are some x-risk sceptical thoughts:

You could reasonably think human extinction this century is very unlikely. One way to reach this conclusion is simply to work through the most plausible causes of human extinction, and reach low odds for each. Vasco Grilo does this for (great power) conflict and nuclear winter, John Halstead suggests extinction risk from extreme climate change is very low here, and the background rate of extinction from natural sources can be bounded by (among other things) observing how long humans have already been around for. That leaves extinction risk from AI and (AI-enabled) engineered pandemics, where discussion is more scattered and inconclusive. Here and here are some reasons for scepticism about AI existential risk.
- Even if the arguments for AI x-risk are sound, then it's not clear how they are arguments for expecting literal human extinction over outcomes like ‘takeover’ or ‘disempowerment’. It's hard to see why AI takeover would lead to smouldering ruins, versus continued activity and ‘life’, just a version not guided by humans or their values.
So “existential catastrophe” probably shouldn't just mean "human extinction". But then it surprisingly slippery as a concept. Existential risk is the risk of existential catastrophe, but it's difficult to give a neat and intuitive definition of “existential catastrophe” such that “minimise existential catastrophe” is a very strong guide for how to do good. Hilary Greaves dicusses candidate definitions here.
From (1), you might think that if x-risk reduction this century should be a near-top priority, then most its importance comes from mitigating non-extinction catastrophes, like irreversible dystopias. But few current efforts are explicitly framed as ways to avoid dystopian outcomes, and it's less clear how to do that. Other than preventing AI disempowerment or takeover, assuming those things are dystopian.
But then isn't x-risk work basically just about AI, and maybe also biorisk? Shouldn't specific arguments for those risks and ways to prevent them therefore matter more than more abstract arguments for the value of mitigating existential risks in general?
Many strategies to mitigate x-risks trade off uncomfortably against other goods. Of course they require money and talent, but it's hard to argue the world is spending too much on e.g. preventing engineered pandemics. But (to give a random example), mitigating x-risk from AI might require strong AI control measures. If we also end up thinking things like AI autonomy matter, that could be an uncomfortable (if worthwhile) price to pay.
It's not obvious that efforts to improve prospects for the long-run future should focus on preventing unrecoverable disasters. There is a strong preemptive argument for this; roughly that humans are likely to recover from less severe disasters, and so retain most their prospects (minus the cost of recovering, which is assumed to be small in terms of humanity's entire future). The picture here is one on which the value of the future is roughly bimodal — either we mess up irrecoverable and achieve close to zero of our potential, or we reach roughly our full potential. But that bimodal picture isn't obviously true. It might be comparably important to find ways to turn a mediocre-by-default future into a really great future, for instance.
- A related picture that “existential catastrophe” suggests is that the causes of losing all our potential are fast and discrete events (bangs) rather than gradual processes (whimpers). But why are bangs more likely than whimpers? (See e.g. “you get what you measure” here).
Arguments for prioritising x-risk mitigation often involve mistakes, like strong ‘time of perils’ assumptions and apples to oranges comparisons. A naive case for prioritising x-risk mitigation might go like this: “reducing x-risk this century by 1 percentage point is worth one percentage point of the expected value of the entire future conditional on no existential catastrophes. And the entire future is huge, it's like lives. So reducing x-risk by even a tiny fraction, say $y %$ , this century saves $10^{x} \cdot y %$ (a huge number of) lives in expectation. The same resources going to any work directed at saving lives within this century cannot save such a huge number of lives in expectation even if it saved 10 billion people.” This is too naive for a couple reasons:
- This assumes this century is the only time where an existential catastrophe could occur. Better would be “the expected value of the entire future conditional on no existential catastrophe this century”, which could be much lower.
- This compares long-run effects with short-run effects without attempting to evaluate the long-run effects of interventions not deliberately targeted at reducing existential catastrophe this century.
Naive analysis of the value of reducing existential catastrophe also doesn't account for ‘which world gets saved’. This feels especially relevant when assessing the value of preventing human extinction, where you might expect the worlds where extinction-preventing interventions succeed in preventing extinction are far less valuable than the expected value of the world conditional on no extinction (since narrowly avoiding extinction is bad news about the value of the rest of the future). Vasco Grilo explores this line of thinking here, and I suggest some extra thoughts here.
The fact that some existential problems (e.g. AI alignment) seem, on our best guess, just about solvable with an extra push from x-risk motivated people doesn't itself say much about the chance that x-risk motivated people make the difference in solving those problems (if we're very uncertain about how difficult the problems are). Here are some thoughts about that.

These thoughts make me hesitant about confidently acting as if x-risk is overwhelmingly important, even compared to other potential ways to improve the long-run future, or other framings on the importance of helping navigate the transition to very powerful AI.

But I still existential risk matters greatly as an action-guiding idea. I like this snippet from the FAQ page for The Precipice —

But for most purposes there is no need to debate which of these noble tasks is the most important—the key point is just that safeguarding humanity’s longterm potential is up there among the very most important priorities of our time.

[Edited a bit for clarity after posting]

Owen Cotton-BarrattJun 26 20246

I certainly think these are all good to express (and I could reply to them, though I won't right now). But also, they're all still pretty crisp/explicit. Which is good! But I wouldn't want people to think that sceptical thoughts have to get to this level of crispness before they can deserve attention.

finmJun 26 20242

Agree.

ArepoJun 27 20245

So “existential catastrophe” probably shouldn't just mean "human extinction". But then it surprisingly slippery as a concept. Existential risk is the risk of existential catastrophe, but it's difficult to give a neat and intuitive definition of “existential catastrophe” such that “minimise existential catastrophe” is a very strong guide for how to do good. Hilary Greaves dicusses candidate definitions here.

Tooting my own trumpet, I did a lot of work on improving the question x-riskers are asking in this sequence.

Owen Cotton-BarrattJun 26 20244

By the way, I'm curious which of these points give you personally the greatest hesitance in endorsing a focus on x-risk, or something.

finmJun 26 202413

I endorse many (more) people focusing on x-risk and it is a motivation and focus of mine; I don't endorse “we should act confidently as if x-risk is the overwhelmingly most important thing”.

Honestly, I think the explicitness of my points misrepresents what it really feels like to form a view on this, which is to engage with lots of arguments and see what my gut says at the end. My gut is moved by the idea of existential risk reduction as a central priority, and it feels uncomfortable being fanatical about it and suggesting others do the same. But it struggles to credit particular reasons for that.

To actually answer the question: (6), (5), and (8) stand out, and feel connected.

Vasco Grilo🔸Jun 29 20242

Great points, Fin!

Vasco Grilo does this for (great power) conflict and nuclear winter

On nuclear winter, besides my crosspost for Bean's analysis linked above, I looked more in-depth into the famine deaths and extinction risk (arriving to an annual extinction risk of 5.93*10^-12). I also got an astronomically low annual extinction risk risk from asteroids and comets (2.20*10^-14) and volcanoes (3.38*10^-14).

John Halstead suggests extinction risk from extreme climate change is very low here

I think this study also implies as astronomically low extinction risk from climate change.

And the entire future is huge, it's like lives. So reducing x-risk by even a tiny fraction, say $y %$ , this century saves $10^{(x - 2) \cdot y}$ (a huge number of) lives in expectation.

I believe $y$ is not supposed to be in the exponent.

This compares long-run effects with short-run effects without attempting to evaluate the long-run effects of interventions not deliberately targeted at reducing existential catastrophe this century.

Relatedly:

finmJul 1 20244

Thanks Vasco!

Anonymous150Jun 26 202428

The reasons are:

I am primarily a hedonistic utilitarian. For me to have a gut feeling that an action is worth taking, I need to be convinced that its expected value is high in this moral theory.
It does not seem clear to me that reducing x-risks from AI leads to scenarios with a better expected value than the default scenario.
- I have the impression that the default scenario is an AI not aligned with what humanity wants, and that perhaps destroys humanity or destabilizes human societies greatly.
- The scenario towards which work on AI safety/alignment leads us seems to be a scenario where AI is aligned with the values of humanity. I don't think humanity is actively seeking to make sentient beings suffer, but I think humanity shows very little consideration for the suffering of animals and even much of humanity itself. So I also guess that humanity wouldn't care a lot about digital minds suffering. So given the current values of humanity, it is not at all clear to me that this scenario is better than the scenario where AI 'destroys everything'.
- The best possible scenario is the one where AI is aligned with 'good values' (maybe something that starts by being rather agnostic about what is good, but still with the idea that sentience is a very promising criterion, and which seeks out what is good. Once it is sufficiently sure that it knows what is good, it begins to optimize for it). I think that work on AI safety increases the probabilities that this scenario will occur, but that there is a risk that it mainly increases the probabilities that the previous scenario occurs. So, I am afraid that people in AI safety maximize the probability of the best possible scenario occurring, but do not maximize the overall expected value.
- So, to me, the main question for knowing if work on AI safety is net positive is whether the intermediary scenario where AI is aligned with current humanity values, but not with 'Great values', is better than the scenario where AI is not aligned at all
- For example, I like the discussion proposed here by the @Center on Long-Term Risk: Would a human-inspired AI or rogue AI cause more suffering?
The people working in AI safety with whom I discuss do not seem to have thought enough for my taste about what is above. It seems they place more weight than I do on the intrisical bad value of 'humanity goes extinct because of AI.' This makes me feel like I can't defer much to them in my decision-making.
More globally, I have a feeling that working on AI safety is more about "Ensuring the future is big" rather than "Ensuring the future is good"

Ultimately, the two questions I would like to answer are:

Is working on conventional AI Safety and / or Governance net positive in expectation if you mainly endorse hedonist utilitarianism?
If not, is there an other kind of work on AI that is robustly positive if you mainly endorse hedonist utilitarianism? Does it seem more cost-effective in this moral theory than "conventional" work for reducing animal suffering?

Vasco Grilo🔸Jul 1 20243

Interesting points!

So, to me, the main question for knowing if work on AI safety is net positive is whether the intermediary scenario where AI is aligned with current humanity values, but not with 'Great values', is better than the scenario where AI is not aligned at all

I suppose that, for most the vast majority of cases, trying to make a technology safer does in fact make it safer. So I believe there should be a strong prior for working on AI safety being good. However, I still think corporate campaigns for chicken welfare are more cost-effective.

OttoJun 26 20244

I sympathize with working on a topic you feel in your stomach. I worked on climate and switched to AI because I couldn't get rid of a terrible feeling about humanity going to pieces without anyone really trying to solve the problem (~4 yrs ago, but I'd say this is still mostly true). If your stomach feeling is in climate instead, or animal welfare, or global poverty, I think there is a case to be made that you should be working in those fields, both because your effectiveness will be higher there and because it's better for your own mental health, which is always important. I wouldn't say this cannot be AI xrisk: I have this feeling about AI xrisk, and I think many eg. PauseAI activists and others do, too.

Sean SweeneyJun 26 20243

Thanks for the post. I just today was thinking through some aspects of expected value theory and fanaticism (i.e., being fanatic about applying expected value theory) that I think might apply to your post. I had read through some of Hayden Wilkinson’s Global Priorities Institute report from 2021, “In defense of fanaticism,” and he brought up a hypothetical case of donating $2000 (or whatever it takes to statistically save one life) to the Against Malaria Foundation (AMF), or giving the money instead to have a very tiny, non-zero chance of an amazingly valuable future by funding a very speculative research project. I changed the situation for myself to consider why would you give $2000 to AMF instead of donating it to try to reduce existential risk by some tiny amount, when the latter could have significantly higher expected value. I’ve come up with two possible reasons so far to not give your entire $2000 to reducing existential risk, even if you initially intellectually estimate it to have much higher expected value:

As a hedge - how certain are you of how much difference $2000 would make to reducing existential risk? If 8 billion people were going to die and your best guess is that $2000 could reduce the probability of this by, say, 1E-7%/year, the expected value of this in a year would be 8 lives saved, which is more than the 1 life saved by AMF (for simplicity, I’m assuming that 1 life would be saved from malaria for certain, and only considering a timeframe of 1 year). (Also, for ease of discussion, I’m going to ignore all the value lost in future lives un-lived if humans go extinct.) So now you might say your $2000 is estimated to be 8 times more effective if it goes to existential risk reduction than malaria reduction. But how sure are you of the 1E-7%/year number? If the “real” number is 1E-8%/year, now you’re only saving 0.8 life in expectation. The point is, if you assigned some probability distribution to your estimate of existential risk reduction (or even increase), you’d find that some finite percentage of cases in this distribution would favor malaria reduction over existential risk reduction. So the intellectual math of fanatical expected value maximizing, when considered more fully, still supports sending some fraction of money to malaria reduction rather than sending it all to existential risk reduction. (Of course, there’s also the uncertainty of applying expected value theory fanatically, so you could hedge that as well if a different methodology gave different prioritization answers.)
To appear more reasonable to people who mostly follow their gut - “What?! You gave your entire $2000 to some pie in the sky project on supposedly reducing existential risk that might not even be real when you could’ve saved a real person’s life from malaria?!” If you give some fraction of your money to a cause other people are more likely to believe is, in their gut, valuable, such as malaria reduction, you may have more ability to persuade them into seeing existential risk reduction as a reasonable cause for them to donate to as well. Note: I don’t know how much this would reap in terms of dividends for existential risk reduction, but I wouldn’t rule it out.

I don’t know if this is exactly what you were looking for, but these seem to me to be some things to think about to perhaps move your intellectual reasoning closer to your gut, meaning you could be intellectually justified in putting some of your effort into following your gut (how much exactly is open to argument, of course).

In regards to how to make working on existential risk more “gut wrenching,” I tend to think of things in terms of responsibility. If I think I have some ability to help save humanity from extinction or near extinction, and I don’t act on that, and then the world does end, imagining that situation makes me feel like I really dropped the ball on my part of responsibility for the world ending. If I don’t help people avoid dying from malaria, I do still feel a responsibility that I haven’t fully taken up, but that doesn’t hit me as hard as the chance of the world ending, especially if I think I have special skills that might help prevent it. By the way, if I felt like I could make the most difference personally, with my particular skill set and passions, in helping reduce malaria deaths, and other people were much more qualified in the area of existential risk, I’d probably feel more responsibility to apply my talents where I thought they could have the most impact, in that case malaria death reduction.

worseJun 28 20243

Noting another recent post doing this: https://forum.effectivealtruism.org/posts/RbCnvWyoiDFQccccj/on-the-dwarkesh-chollet-podcast-and-the-cruxes-of-scaling-to

Andy Morgan 🔸Jun 29 20243

In the dis-spirit of this article I'm going to take the opposite tack and I'm going to explore nagging doubts that I have about this line of argument.

To be honest, I'm starting to get more and more sceptical/annoyed about this behaviour (for want of a better word) in the effective altruism community. I'm certainly not the first to voice these concerns, with both Matthew Yglesias and Scott Alexander noting how weird it is (if someone tells you that your level of seeking criticism gives off weird BDSM vibes, you've probably gone too far).

Am I all in favour of going down intellectual rabbit holes to see where they take you? No. And I don't think it should be encouraged wholesale in this community. Maybe I just don't have the intellectual bandwidth to understand the arguments, but a lot of the time it just seems to lead to intellectual wank. With the most blatant example I've come across being infinite ethics. If infinities mean that anything is both good and bad in expectation, that should set off alarm bells that that way madness lies.

The crux of this argument also reminds me of rage therapy. Maybe you shouldn't explore those nagging doubts and express them out loud, just like maybe you shouldn't scream and hit things based on the mistaken belief that it'll help to get out your anger out. Maybe you should just remind yourself that its totally normal for people to have doubts about x-risk compared to other cause areas, because of a whole bunch of reasons that totally make sense.

Thankfully, most people in the effective altruism community do this. They just get on with their lives and jobs, and I think that's a good thing. There will always be some individuals that will go down these intellectual rabbit holes and they won't need to be encouraged to do so. Let them go for gold. But at least in my personal view, the wider community doesn't need to be encouraged to do this.

Karthik TadepalliJun 30 202410

On a similarly simple intellectual level, I see "people should not suppress doubts about the critical shift in direction that EA has taken over the past 10 years" as a no-brainer. I do not see it as intellectual wank in an environment where every other person assumes p(doom) approaches 1 and timelines get shorter by a year every time you blink. EA may feature criticism circle jerking overall, but I think this kind of criticism is actually important and not actually super well received (I perceive a frosty response whenever Matthew Barnett criticizes AI doomerism)