MichaelA

I'm a Researcher and Writer for Convergence Analysis (https://www.convergenceanalysis.org/), an existential risk strategy research group.

Posts of mine that were written for/with Convergence will mention that fact. In other posts, and in most of my comments, opinions expressed are my own.

I'm always very interested in feedback, comments, ideas, etc., and potentially research/writing collaborations.

About half of my posts are on LessWrong: https://www.lesswrong.com/users/michaela

MichaelA's Comments

What actions would obviously decrease x-risk?

You may already be aware of this, and/or the window of relevance may have passed, but just thought I'd mention that Toby Ord discusses a similar matter in The Precipice. He seems to come to roughly similar conclusions to you and to Sagan et al., assuming I'm interpreting everyone correctly.

E.g. he writes:

There is active debate about whether more should be done to develop deflection methods ahead of time. A key problem is that methods for deflecting asteroids away from Earth also make it possible to deflect asteroids towards Earth. This could occur by accident (e.g. while capturing asteroids for mining), or intentionally (e.g. in a war, or in a deliberate attempt to end civilization). Such a self-inflicted asteroid impact is extremely unlikely, yet may still be the bigger risk.

This seems like an interesting and important point, and an example of how important it can be to consider issues like downside risks, the unilateralist’s curse, etc. - perhaps especially in the area of existential risk reduction. And apparently even with what we might see as one of the rare "obviously" good options!

Something I find slightly odd, and that might conflict with yours or Sagan et al.'s views, was that Ord also wrote:

One reason [such a self-inflicted asteroid impact] is unlikely is that several of the deflection methods (such as nuclear explosions) are powerful enough to knock the asteroid off course, but not refined enough to target a particular country with it. For this reason, these might be the best methods to pursue.

I don't really know anything about this area, but it seems strange to hear that the option involving nuclear explosions is the safer one. And I wonder if the increased amount of explosives, development of tech for delivering it to asteroids, etc., could increase risks independently of asteroid-deflection, such as if it can be repurposed for just directly harming countries on Earth. Or perhaps it could reduce the safety benefits we'd get from having colonies on other moons/planets/asteroids/etc.?

Again, though, this is a field I know almost nothing about. And I assume Ord considered these points. Also, obviously there are many nuclear weapons and delivery mechanisms already.

MichaelA's Shortform

Collection of ways of classifying existential risk pathways/mechanisms

Each of the following works show or can be read as showing a different model/classification scheme/taxonomy:

Personally, I think the model/classification scheme in Defence in Depth is probably the most useful. But I think at least a quick skim of the above sources is useful; I think they each provide an additional useful angle or tool for thought.

I intend to add to this list over time. If you know of other relevant work, please mention it in a comment.

Wait, exactly what are you actually collecting here?

The scope of this collection is probably best revealed by checking out the above sources.

But to further clarify, here are two things I don't mean, which aren't included in the scope:

  • Classifications into things like "AI risk vs biorisk", or "natural vs anthropogenic"
    • Such categorisation schemes are clearly very important, but they're also well-established and you probably don't need a list of sources that show them.
  • Classifications into different "types of catastrophe", such as Ord's distinction between extinction, unrecoverable collapse, and unrecoverable dystopia
    • This is also very important, and maybe I should make such a collection at some point, but it's a separate matter to this.
My thoughts on Toby Ord’s existential risk estimates

Some other context I perhaps should've given is that Ord writes that his estimates already:

incorporate the possibility that we get our act together and start taking these risks very seriously. Future risks are often estimated with an assumption of ‘business as usual’: that our levels of concern and resources devoted to addressing the risks stay where they are today. If I had assumed business as usual, my risk estimates would have been substantially higher. But I think they would have been misleading, overstating the chance that we actually suffer an existential catastrophe. So instead, I’ve made allowances for the fact that we will likely respond to the escalating risks, with substantial efforts to reduce them.

The numbers therefore represent my actual best guesses of the chance the threats materialise, taking our responses into account. If we outperform my expectations, we could bring the remaining risk down below these estimates. Perhaps one could say that we were heading towards Russian roulette with two bullets in the gun, but that I think we will remove one of these before it’s time to pull the trigger. And there might just be time to remove the last one too, if we really try.
My thoughts on Toby Ord’s existential risk estimates

Update: I'm now creating this sort of a collection of estimates, partly inspired by this comment thread (so thanks, MichaelStJules!). I'm not yet sure if I'll publish them; I think collecting a diversity of views together will reduce rather than exacerbate information cascades and such, but I'm not sure. I'm also not sure when I'd publish, if I do publish.

But I think the answers are "probably" and "within a few weeks".

If anyone happens to know of something like this that already exists, and/or has thoughts on whether publishing something like this would be valuable or detrimental, please let me know :)

My thoughts on Toby Ord’s existential risk estimates
Also, I expect to see small engineered pandemics, but only after effective genetic engineering is widespread. So the fact that we haven't seen any so far is not much evidence.

Yes, that was broadly the response I had in mind as well. Same goes for most of the "unforeseen"/"other" anthropogenic risks; those categories are in the chapter on "Future risks", and are mostly things Ord appears to think either will or may get riskier as certain technologies are developed/advanced.

Sleepy reply to Tobias' "Ord's estimates seem too high to me": An important idea in the book is that "the per-century extinction risks from “natural” causes must be very low, based in part on our long history of surviving such risks" (as I phrase it in this post). The flipside of that is roughly the argument that we haven't got strong evidence of our ability to survive (uncollapsed and sans dystopia) a long period with various technologies that will be developed later, but haven't been yet.

Of course, that doesn't seem sufficient by itself as a reason for a high level of concern, as some version of that could've been said at every point in history when "things were changing". But if you couple that general argument with specific reasons to believe upcoming technologies could be notably risky, you could perhaps reasonably arrive at Ord's estimates. (And there are obviously a lot of specific details and arguments and caveats that I'm omitting here.)

Takeaways from safety by default interviews

One minor quibble with this post's language, rather than any of its actual claims: The title includes the phrase "safety by default", and the terms "optimism" and "optimist" are repeatedly applied to these researchers or their views. The title is reasonable in a sense, as these interviews were partially/mostly about whether AI would be "safe by default", or why we might believe that it would be, or why these researchers believe that that's likely. And the use of "optimism"/"optimist" are reasonable in a sense, as these researchers were discussing why they're relatively optimistic, compared to something like e.g. the "typical MIRI view".

But it seems potentially misleading to use those phrases here without emphasising (or at least mentioning) that at least some of these researchers think there's a greater than 1% chance of extinction or other existential catastrophe as a result of AI. E.g., the statement "Rohin reported an unusually large (90%) chance that AI systems will be safe without additional intervention" implies a 10% credence that that won't be the case (and Paul and Adam seem to share very roughly similar views, based on Rohin's summaries). Relevant quote from The Precipice:

In 1939, Enrico Fermi told Szilard the chain reaction was but a ‘remote possibility’ [...]
Fermi was asked to clarify the ‘remote possibility’ and ventured ‘ten percent’. Isidor Rabi, who was also present, replied, ‘Ten percent is not a remote possibility if it means that we may die of it. If I have pneumonia and the doctor tells me that there is a remote possibility that I might die, and it’s ten percent, I get excited about it’

And in this case, the stakes are far greater (meaning no offence to Isidor Rabi).

My guess would be that a decent portion of people who (a) were more used to something like the FHI/80k/Oxford views, and less used to the MIRI/Bay Area views, and (b) read this without having read the interviews in great detail, might think that these researchers believe something like "The chance things go wrong is too small to be worth anyone else worrying about." Which doesn't seem accurate, at least for Rohin, Paul, and Adam.

To be clear: I don't think you're intending to convey that message. And I definitely wouldn't want to try shut down any statements about AI that don't sound like "this is a huge deal, everyone get in here now!" I'm just a bit concerned about posts accidentally conveying an overly optimistic/sanguine message when that wasn't actually their intent, and when it wasn't supported by the arguments/evidence provided.

(Something informing this comment is my past experience reading a bunch of cognitive science work on how misinformation spreads and can be sticky. Some discussion here, and a particularly relevant paper here.)

Takeaways from safety by default interviews

Thanks for this post!

There are lots of calls for individuals with views around AI risk to engage with each other and understand the reasoning behind  fundamental disagreements.

I strongly share this view, and have therefore quite appreciated this project by AI Impacts. A lot of resources are going into this field, which I'm broadly very supportive of, but it does seem worth gaining a clearer understanding of precisely why, and precisely which approaches should get which portions of those resources.

One other post that I personally found really useful for understanding the various views, the underlying assumptions, and their interconnections was Clarifying some key hypotheses in AI alignment (coauthored by Rohin). I've also collected here ~30 works I found that strongly relate to this goal (I plan to update that collection over time, and have now added this post to it as well).

And I'm currently working on a post with a very similar objective, but for longtermist/x-risk strategies more broadly. Hopefully that'll be out soon.

Assumptions about the far future and cause priority

Interesting view. It seems to me like it makes sense, but I also feel like it'd be valuable for it to be fleshed out and critiqued further to see how solid it is. (Perhaps this has already been done somewhere - I do feel like I've heard vaguely similar arguments here and there.)

Also, arriving at this thread 5 months late, I notice Toby Ord makes a similar argument in The Precipice. He writes about:

a subtle form of correlation - not between two risks, but between risks and the value of the future. There might be risks that are much more likely to occur in worlds with high potential. For example, if it is possible to create artificial intelligence that far surpasses humanity in every domain, this would increase the risk from misaligned AGI, but would also increase the value we could achieve using AGI that was aligned with human values. By ignoring this correlation, the total risk approach underweights the value of work on this risk.
This can be usefully understood in terms of there being a common cause for the risk and the benefit, producing the correlation. A high ceiling on technological capability might be another common cause between a variety of risks and extremely positive futures. I will set this possibility aside in the rest of the book, but it is an important issue for future work to explore.
My thoughts on Toby Ord’s existential risk estimates

Just found a quote from the book which I should've mentioned earlier (perhaps this should've also been a footnote in this post):

any notion of risk must involve some kind of probability. What kind is involved in existential risk? Understanding the probability in terms of objective long-run frequencies won't work, as the existential catastrophes we are concerned with can only ever happen once, and will always be unprecedented until the moment it is too late. We can't say the probability of an existential catastrophe is precisely zero just because it hasn't happened yet.
Situations like these require an evidential sense of probability, which describes the appropriate degree of belief we should have on the basis of the available information. This is the familiar type of probability used in courtrooms, banks and betting shops. When I speak of the probability of an existential catastrophe, I mean the credence humanity should have that it will occur, in light of our best evidence.

And I'm pretty sure there was another quote somewhere about the complexities with this.

As for your comment, I'm not sure if we're just using language slightly differently or actually have different views. But I think we do have different views on this point:

If there's only really one reasonable model, and all of the probabilities are pretty precise in it (based on precedent), then the final probability should be pretty precise, too.

I would say that, even if one model is the most (or only) reasonable one we're aware of, if we're not certain about the model, we should account for model uncertainty (or uncertainty about the argument). So (I think) even if we don't have specific reasons for other precise probabilities, or for decreasing the precision, we should still make our probabilities less precise, because there could be "unknown unknowns", or mistakes in our reasoning process, or whatever.

If we know that our model might be wrong, and we don't account for that when thinking about how certain vs uncertain we are, then we're not using all the evidence and information we have. Thus, we wouldn't be striving for that "evidential" sense of probability as well as we could. And more importantly, it seems likely we'd predictably do worse in making plans and achieving our goals.

Interestingly, Ord is among the main people I've seen making the sort of argument I make in the prior paragraph, both in this book and in two prior papers (one of which I've only read the abstract of). This increased my degree of surprise at him appearing to suggest he was fairly confident these estimates were of the right order of magnitude.

My thoughts on Toby Ord’s existential risk estimates
I'd be curious to know if there are others who have worked as hard on estimating any of these probabilities and how close their estimates are to his.

I definitely share this curiosity. In a footnote, I link to this 2008 "informal survey" that's the closest thing I'm aware of (in the sense of being somewhat comprehensive). It's a little hard to compare the estimate, as that was for extinction (or sub-extinction events) rather than existential catastrophe more generally, and was for before 2100 rather than before 2120. But it seems to be overall somewhat more pessimistic than Ord, though in roughly the same ballpark for "overall/total risk", AI, and engineered pandemics at least.

I don't off the top of my head know anything comparable in terms of amount of effort, except in the case of individual AI researchers estimating the risks from AI, or specific types of AI catastrophe - nothing broader. Or maybe a couple 80k problem profiles. And I haven't seen these collected anywhere - I think it could be cool if someone did that (and made sure the collection prominently warned against anchoring etc.).

A related and interesting question would be "If we do find past or future estimates based on as much hard work, and find that they're similar to Ord's, what do we make of this observation?" It could be taken as strengthening the case for those estimates being "about right". But it could also be evidence of anchoring or information cascades. We'd want to know how independent the estimates were. (It's worth noting that the 2008 survey was from FHI, where Ord works.)

Load More