Erik Jenner

34 karmaJoined Jan 2022


Oh, I agree that comment + downvote is more useful for others than only downvote, my main claim was that only downvote is more useful than nothing. So I don't want there to be a norm that you need to comment when downvoting, if that leads to fewer people voting (which I think would be likely). See Well-Kept Gardens Die By Pacifism for some background on why I think that would be really bad.

Tbc, I don't want to discourage commenting to explain votes, I just think the decision of whether that is worth your time should be up to you.

I understand some people don’t feel that certain posts are worth engaging with (which is fine) but at least don’t downvote then?

I disagree, I think it's perfectly fine for people to downvote posts without commenting. The key function of the karma system is to control how many people see a given piece of content, so I think up-/downvotes should reflect "should this content be seen by more people here?" If I think a post clearly isn't worth reading (and in particular not engaging with), then IMO it makes complete sense to downvote so that fewer other people spend time on it. In contrast, if I disagree with a post but think it's well-argued and worth engaging with I would not downvote, and would engage in the comments instead.

Ok, thanks for clarifying! FWIW, everything I said was meant to be specifically about AGI takeover because of misalignment (i.e. excluding misuse), so it does seem we disagree significantly about the probability of that scenario (and about the effect of using less conjunctive models). But probably doesn't make sense to get into that discussion too much since my actual cruxes are mostly on the object level (i.e. to convince me of low AI x-risk, I'd find specific arguments about what's going to happen and why much more persuasive than survey-based models).

Yeah, I totally agree that combining such a detailed analysis as you are doing with structural uncertainty would be a really big task. My point certainly wasn't that you hadn't done "enough work", this is already a long and impressive write-up.

I will say though that if you agree that model uncertainty would likely lead to substantially higher x-risk estimates, the takeaways in this post are very misleading. E.g.:

  • "The headline figure from this essay is that I calculate the best estimate of the risk of catastrophe due to out-of-control AGI is approximately 1.6%."
  • "analysis of uncertainty reveals that the actual risk of AI Catastrophe is almost an order of magnitude less than most experts think it is"
  • "the main result I want to communicate is that it is more probable than not that we live in a world where the risk of AGI Catastrophe is <3%."

I disagree with each of those claims, and I don't think this post makes a strong enough case to justify them. Maybe the crux is this:

in reality both the model structural uncertainty and parameter uncertainty will contribute to the overall uncertainty.

My main point was not that structural uncertainty will increase our overall uncertainty, it was that specifically using a highly conjunctive model will give very biased results compared to considering a broader distribution over models. Not sure based on your reply if you agree with that (if not, then the takeaways make more sense, but in that case we do have a substantial disagreement).

Models such as the Carlsmith one, which treat AI x-risk as highly conjunctive (i.e. lots of things need to happen for an AI existential catastrophe), already seem like they'll bias results towards lower probabilities (see e.g. this section of Nate's review of the Carlsmith report). I won't say more on this since I think it's been discussed several times already.

What I do want to highlight is that the methodology of this post exacerbates that effect. In principle, you can get reasonable results with such a model if you're aware of the dangers of highly conjunctive models, and sufficiently careful in assigning probabilities.[1] This might at least plausibly be the case for a single person giving probabilities, who has hopefully thought about how to avoid the multiple stage fallacy, and spent a lot of time thinking about their probability estimates. But if you just survey a lot of people, you'll very likely get at least a sizable fraction of responses who e.g. just tend to assign probabilities close to 50% because anything else feels overconfident, or who don't actually condition enough on previous steps having happened, even if the question tells them to. (This isn't really meant to critique people who answered the survey—it's genuinely hard to give good probabilities for these conjunctive models). The way the analysis in this post works, if some people give probabilities that are too low, the overall result will also be very low (see e.g. this comment).

I would strongly guess that if you ran exactly the same type of survey and analysis with a highly disjunctive model (e.g. more along the lines of this one by Nate Soares), you would get way higher probabilities of X-risk. To be clear, that would be just as bad, it would likely be an overestimate!

One related aspect I want to address:

Most models of AI risk are – at an abstract enough level – more like an elimination tournament than a league, at least based on what has been published on various AI-adjacent forums. The AI needs everything to go its way in order to catastrophically depower humanity.

There is a lot of disagreement about whether AI risk is conjunctive or disjunctive (or, more realistically, where it is on the spectrum between the two). If I understand you correctly (in section 3.1), you basically found only one model (Carlsmith) that matched your requirements, which happened to be conjunctive. I'm not sure if that's just randomness, or if there's a systematic effect where people with more disjunctive models don't tend to write down arguments in the style "here's my model, I'll assign probabilities and then multiply them".

If we do want to use a methodology like the one in this post, I think we'd need to take uncertainty over the model itself extremely  seriously. E.g. we could come up with a bunch of different models, assign weights to them somehow (e.g. survey people about how good a model of AI x-risk this is), and then do the type of analysis you do here for each model separately. At the end, we average over the probabilities each model gives using our weights. I'm still not a big fan of that approach, but at least it would take into account the fact that there's a lot of disagreement about the conjunctive vs disjunctive character of AI risk. It would also "average out" the biases that each type of model induces to some extent.


  1. ^

    Though there's still the issue of disjunctive pathways being completely ignored, and I also think it's pretty hard to be sufficiently careful.

"EA has a strong cultural bias in favor of believing arbitrary problems are solvable".

I think you're pointing to a real phenomenon here (though I might not call it an "optimism bias"—EAs also tend to be unusually pessimistic about some things).

I have pretty strong disagreements with a lot of the more concrete points in the post though, I've tried to focus on the most important ones below.

Conclusion One: Pursuing the basic plan entailed in premises 1-4 saves, in expectation, at least 4.8 million lives (800,000 * 0.06 * 0.1 * 0.1). 

(I think you may have missed the factor of 0.01, the relative risk reduction you postulated? I get 8 billion * 0.06 * 0.01 * 0.1 * 0.1 = 48,000. So AI safety would look worse by a factor of 100 compared to your numbers.)

But anyway, I strongly disagree with those numbers, and I'm pretty confused as to what kind of model generates them. Specifically, you seem to be extremely confident that we can't solve AI X-risk (< 1/10,000 chance if we multiply together the 1% relative reduction with your two 10% chances). On the other hand, you think we'll most likely be fine by default (94%). So you seem to be saying that there probably isn't any problem in the first place, but if there is, then we should be extremely certain that it's basically intractable. This seems weird to me. Why are you so sure that there isn't a problem which would lead to catastrophe by default, but which could be solved by e.g. 1,000 AI safety researchers working for 10 years? To get to your level of certainty (<1/10,000 is a lot!), you'd need a very detailed model of AI X-risk IMO, more detailed than I think anyone has written about. A lot of the uncertainty people tend to have about AI X-risk comes specifically from the fact that we're unsure what the main sources of risk are etc., so it's unclear how you'd exclude the possibility that there are significant sources of risk that are reasonably easy to address.

As to why I'm not convinced by the argument that leads you to the <1/10,000 chance: the methodology of "split my claim into a conjunction of subclaims, then assign reasonable-sounding probabilities to each, then multiply" often just doesn't work well (there are exceptions, but this certainly isn't one of them IMO). You can get basically arbitrary result by splitting up the claim in different ways, since what probabilities are "reasonable-sounding" isn't very consistent in humans.

Okay, a longtermist might say. Maybe the odds are really slim that we thread this needle, and then also the subsequent needles required to create an interstellar civilization spanning billions of years. But the value of that scenario is so high that if you shut up and multiply, it's worth putting a lot of resources in that direction.

I can't speak for all longtermists of course, but that is decidedly not an argument I want to make (and FWIW, my impression is that this is not the key objection most longtermists would raise). If you convinced me that our chances of preventing an AI existential catastrophe were <1/10,000, and that additionally we'd very likely die in a few centuries anyway (not sure just how likely you think that is?), then I would probably throw the expected value calculations out the window and start from scratch trying to figure out what's important. Basically for exactly the reasons you mention: at some point this starts feeling like a Pascal's mugging, and that seems fishy and confusing.

But I think the actual chances we prevent an AI existential catastrophe are way higher than 1/10,000 (more like 1/10 in terms of the order of magnitude). And I think conditioned on that, our chances of surviving for billions of years are pretty decent (very spontaneous take: >=50%). Those feel like cruxes to me way more than whether we should blindly do expected value calculations with tiny probabilities, because my probabilities aren't tiny.


Scenario Two: Same as scenario one, but there's a black hole/alien invasion/unstoppable asteroid/solar flare/some other astronomical event we don't know about yet that unavoidably destroys the planet in the next millennium or two. (I don't think this scenario is likely, but it is possible.)

I agree it's possible in a very weak sense, but I think we can say something stronger about just how unlikely this is (over the next millennium or two): Nothing like this has happened over the past 65 million years (where I'm counting the asteroid back then as "unstoppable" even though I think we could stop that soon after AGI). So unless you think that alien invasions are reasonably likely to happen soon (but were't likely before we sent out radio waves, for example), this scenario seems to be firmly in the "not really worth thinking about" category.

This may seem really nitpicky, but I think it's important when we talk about how likely it is that we'll continue living for billions of years. You give several scenarios for how things could go badly, but it would be just as easy to list scenarios for how things could go well. Listing very unlikely scenarios, especially just on one side, actively makes our impression of the overall probabilities worse.