Hide table of contents

This is the script of a talk I gave at EAGx Rotterdam, with some citations and references linked throughout. I lay out the argument challenging the relatively narrow focus EA has in existential risk studies, and in favour of more methodological pluralism.  This isn't a finalised thesis, and nor should it be taken as anything except a conversation starter. I hope to follow this up with more rigorous work exploring the questions I pose over the next few years, and hope other do too,  but I thought to post the script to give everyone an opportunity to see what was said. Note, however, the tone of this is obviously the tone of a speech, not as much of a forum post. I hope to link the video when its up.  Mostly, however, this is really synthesising the work of others; very little of this is my own original thought. If people are interested in talking to me about this, please DM me on here.

Existential Risk Studies; the interdisciplinary “science” of studying existential and global catastrophic risk. So, what is the object of our study? There are many definitions of Existential Risk, including an irrecoverable loss of humanity's potential or a major loss of the expected value of the future, both of these from essentially a transhumanist perspective. In this talk, however, I will be using Existential Risk in the broadest sense, taking my definition from Beard et al 2020, with Existential Risk being risk that may result in the very worst catastrophes “encompassing human extinction, civilizational collapse and any major catastrophe commonly associated with these things.”

X-Risk is a risk, not an event. It is defined therefore by potentiality, and thus is inherently uncertain. We can thus clearly distinguish between different global and existential catastrophes (nuclear winters, pandemics) and drivers of existential risk, and there are no one to one mapping of these. The IPCC commonly, and helpfully, splits drivers of risk into hazards, vulnerabilities, exposures, and responses, and through this lens, it is clear that risk isn’t something exogenous, but is reliant on decision making and governance failures, even if that failure is merely a failure of response.  

The thesis I present here is not original, and draws on the work of a variety of thinkers, although I accept full blame for things that may be wrong. I will argue there are two different paradigms of studying X-Risk: a simple paradigm and a complex paradigm. I will argue that EA unfairly neglects the complex paradigm, and that this is dangerous if we want to have a complete understanding of X-Risk to be able to combat it. I am not suggesting the simple paradigm is “wrong”; but that alone it currently doesn’t and  never truly can, capture the full picture of X-Risk. I think the differences in the two paradigms of existential risk are diverse, with some of the differences being “intellectual” due to fundamentally different assumptions about the nature of the world we live in, and some are “cultural” which is more contingent on which thinkers works gain prominence. I won’t really try and distinguish between these differences too hard, as I think this will make everything a bit too complicated. This presentation is merely a start, a challenge to the status quo, not asking for it to be torn down, but arguing for more epistemic and methodological pluralism. This call for pluralism is the core of my argument. 

The “simple” paradigm of existential risk is at present dominant in EA-circles. It tends to assume that the best way to combat X-Risk is identify the most important hazards, find out the most tractable and neglected solutions to those, and work on that. It often takes a relatively narrow range of epistemic tools:  forecasting and toy models, such as game theoretic approaches, thought experiments or well thought out “kill mechanism” causal chains, as fundamentally useful tools at examining the future which is taken to be fundamentally understandable and to a degree predictable, if only we were rational enough and had enough information. Its a methodology that, given the relative lack of evidence on X-Risk, is more based on rationality than empiricism; a methodology that emerges more from analytic philosophy than empirical science. Thus, risks are typically treated quasi-independently, so the question “what is the biggest X-Risk” makes sense and we can approach X-Risk by focusing on quasi-discrete “cause areas” such as AGI, engineered pandemics or nuclear warfare. 

Such an approach can be seen in published works by the community and in the assumptions that programmes and more are set up in. The Precipice finds the separation of X-Risks into the somewhat arbitrary categories of “Natural” “Anthropogenic” and “Future” Risks to be useful, and quantifies those risks based on what each of those quasi-independent hazards contributes.  The Cambridge Existential Risk Initiative summer research fellowship that I was lucky to participate in this summer separated their fellows into categories based broadly on these separate, discrete risks: AI, Climate Change, Biosecurity, Nuclear Weapons and Misc+Meta. Once again, this promotes a siloed approach that sees these things as essentially independent, or at least treating these independently is the best way of understanding them. Even on the swapcard for this conference, there is no category for areas of interest for “Existential Risk”,  “Vulnerabilities” or “Systemic Risk”, whilst there is 2 categories for AI, a category for Nuclear Security, a category for climate change, a category for biosecurity. The “simple” approach to existential risk, permeates almost all the discussions we have in EA about existential risk; it is the sea in which we swim in. Thus, it profoundly affects the way we think about X-Risk. I think it could accurately be described as, in the sense Kuhn discusses it, as a paradigm.  

That's the simple approach. A world, which at its core, we can understand. A world where the pathways to extinction are to some degree definable, identifiable, quantifiable. Or at least, if we are rational enough and research enough, we can understand what the most important X-Risks are,  prioritise these and deal with these.  Its no wonder that this paradigm has been attractive to Effective Altruists; this stuff is our bread and butter. The idea that we can use rational methodologies in good-doing is what we were founded on, and retains its power and strength through the ITN framework. The problem is, I’m not sure this is very good at capturing the whole picture of X-Risk, and we ignore the whole picture at our peril. 


Because maybe the world isn’t so simple, and the future not so predictable. Every facet of our society is increasingly interconnected, our ecological-climatic system coupling to our socio-economic system, global supply chains tied to our financial system tied to our food system. A future emerging from such complexity will be far from simple, or obvious, or predictable. Risk that threatens humanity in such a world will likely interact in emergent ways, or emerge in ways that are not predictable by simple analysis. Rather than predictable “kill mechanisms,” we might worry about tipping thresholds beyond which unsafe system transitions may occur, compounding, “snowballing” effects, worsening cascades, spread mechanisms of collapse and where in a complex system we have the most leverage. Arguably, we can only get the whole picture by acknowledging irreducible complexity, and that the tools that we currently use to give us relatively well defined credences, and a sense of understanding and predictability in the future are woefully insufficient. 

I think its important to note that my argument here is not “there is complexity therefore risk,” but rather that the sort of global interconnected and interdependent systems that we have in place make the sorts of risk we are likely to face inherently unpredictable, and so it isn’t so easily definable as the simple paradigm likes to make out. Even Ord acknowledges this unpredictability, putting the probability of “unforseen anthropogenic risk” at 1 in 30; in fact, whilst I have constantly attacked the core of Ords approach in this talk, I think he acknowledges many of these issues anyway.  And its not like this approach, focusing on fuzzy mechanisms emerging out of feedback loops, thresholds and tipping, is wholly foreign to EA; its arguable that the risk from AGI is motivated by the existence of a tipping threshold, which when past may lead to magnifying impacts in a positive feedback loop (the intelligence explosion), which will lead to unknown but probably very dangerous effects that, due to the complexity of all the systems involved, we probably can’t predict. This is rarely dismissed as pure hand-wavyness as we acknowledge we are dealing with a system that our reasoning can’t fully comprehend.  Whilst EAs tend to utilise a few of the concepts of the complex approach with AGI, elsewhere its ignored, , which is slightly strange, but more on this later.

It is arguable that the complex paradigms focus on the complexity of the world is somewhat axiomatic, based on a different set of assumptions about the way the world functions to the simple approach; one that sees the world as a complex network of interconnected nodes, and risk as primarily emerging from the relatively well known fragility and vulnerability of such as system. I don’t think I can fully prove this to you, because I think it is a fundamental worldview shift, not just a change in the facts, but in the way you experience and understand the world. However, if you want to be convinced, I would look at much of the literature on complexity, on the coupled socio-technical-ecological-political system, the literature on risk such as the IPCC, or texts like “the risk society” on how we conceptualise risk. I’m happy to talk more about this in the Q&A, but right now I hope that you're willing to come along for the ride even if you don’t buy it. 

This is why I treat this as an entirely different paradigm to the current EA paradigm. The complexity approach is fundamentally different. It sees the world as inherently complex, and whilst facets are understandable, at its core the system is so chaotic we can never fully or even nearly fully understand it. It sees the future as not just unpredictable but inherent undefined. It sees risk mostly emerging from our growing, fragile, interconnected system, and typically sees existential hazards as only one part of the equation, with vulnerabilities ,exposures and responses  perhaps at least as important. It takes seriously our  uncertainty with regards to what the topography of the epistemic landscape is, and so uncertainty should be baked into any understanding or approach, and thus favours foresight over forecasting. The epistemic tools that serve the simple approach are simply not useful at dealing with the complexity that this paradigm takes as central to X-Risk, and thus new epistemic tools and frameworks must be developed; whether these have been successful is debatable.

A defender of the “simple” paradigm might argue that this is unfair: after all, thinkers like Ord discuss “direct” and “indirect” risks. This is helpful. The problem is, its very unclear what constitutes a “direct” vs “indirect” existential risk. If a nuclear war kills almost everyone, but the last person alive trips off a rock and falls off a cliff, which was the direct existential risk? The nuclear war or the rock? Well, this example could rightfully be considered absurd (after all, if one person is alive, humanity will go extinct after that person dies) but I hope the idea still broadly stands- very few “direct” existential risks actually wipe the last person out. What about a very deadly pandemic that can only spread due to the global system of international trade, and that the response of reducing transport, combined with climate change, causes major famines across the world, where only both combined cause collapse and extinction? Which is the direct risk? Suddenly, the risk stops looking so neat and simple, but still just as worrying.  

This logic of direct and indirect doesn’t work, because it still favours a quasi-linear mechanistic worldview. Often, something is only considered a “risk factor” if it leads to something that is a direct risk. Such arguments can be seen in John Halstead’s enormous climate google doc, which I think is a relatively good canonical example of the “simple” approach. Here, he argues climate change is not a large contributor to existential risk because it can’t pose a direct risk, and isn’t a major contributor to things that would then wipe us out. So its not a direct risk, nor a first order indirect risk; so its not really a major risk. In fact, because of the simplicity of merely needing to identify the answer to whether it is a direct risk or a 1st order indirect risk, there is not even a need for a methodology, or that slippery word “theory”; one can merely answer the question by thinking about it and making a best guess. The type of system and causal chain dealt with is within the realm that one person can make such a judgement; if you acknowledge the complexity of the global network, such reliance on individual reasoning appears like dangerous overconfidence. 

You might then say that the simple approach can still deal with issues by then looking at 2nd order indirect risks, 3rd order, 4th order and so on. But what happens when you get to nth order indirect risks; this mechanistic, predictable worldview simply cannot deal with that complexity. A reply to this may be that direct risks are just so much larger in expectation, however, this doesn’t fit with our understanding from the study of  complex and adaptive networks, and work done by scholars like Lara Mani on volcanoes further show that cascading nth order impacts of volcanic eruptions may be far larger than the primary direct impacts. Even take the ship stuck in the suez canal- the ripple effects seem far larger than the initial, direct effect. This may similarly turn out the same for the long term impacts of COVID-19 as well.  

Thus it seems the simple approach struggles when dealing with the ways most risks tend to manifest in the real, complex, interconnected world- through vulnerabilities and exposures, through systemic risks and through cascades. In fact, the simple approach tends to take Existential Risk to be synonymous with Existential Hazards, relegating other contributors to risk, like vulnerabilities, exposures and responses to the background. It has no real theory of systemic risk, hence the lack of need for defined methodologies,, and when I mentioned cascading risk to John Halstead in the context of his climate report, he said he simply didn’t think it worth investigating. I don’t think this is a problem with John- despite our disagreements he is an intelligent and meticulous scholar who put a lot of effort into that report; I think this is a problem of simple existential risk analysis- it is not capable of handling the complexity of the real world. 

So we need complex risk analysis, that acknowledges the deep interconnectedness, emergence and complexity of the global system we are in to truly analyse risk. But here we are faced with a dilemma. On the one hand, we have a recognition as to the irreducible complexity of the world, and the inherent uncertainty of the future. On the other, we need to act within this system and understand the risks so we can combat them. So the question is, how?

The first step towards a more complex risk analysis picks up the baton from the simple approach, in emphasising compounding risk; how different hazards interact. More will be discussed on this later.

Secondly, risk is expanded beyond the concept of existential hazards, which is what the simple paradigm focuses on, to discuss vulnerabilities and exposures, as well as responses. To explain vulnerabilities and exposures, imagine someone with a peanut allergy: the peanut is the hazard, the allergy the vulnerability and the exposure is being in the same room as the peanut. The hazard is what kills you, the vulnerability is how you die, and the exposure is the interface between the two. So we can expand what we should do to combat existential risk from just “putting out fires” which is what the hazard-centric approach focuses on, to a more systemic approach focusing on making our overall system more resilient to existential risk. We might identify key nodes where systemic failure could occur, and try and increase their resilience, such as the work Lara Mani has been doing identifying global pinch points where small magnitude volcanic eruptions may cause cascading impacts resulting in a global catastrophe. 

In doing this, we are abandoning the nice, neat categories the simple approach creates. In many ways, it no longer makes sense to talk about risks, as though these were quasi-independent “fires” to put out. Rather, it makes sense to speak about contributors to overall risk, with attempts made to shift the system to greater security, by identifying and reducing sources of risk. This doesn’t just include hazards, but other contributors as well; not just acknowledging the initial effect, but everything that made each cascade more likely. These cascades are not predictable, the threshold beyond which the feedback loop occurs not knowable, and thus foresight, where we may get a sample of what could occur, rather than forecasting where we try and predict what will occur, will be far more useful.  This simple linguistic shift, from risks to risk, can be surprisingly powerful at highlighting the difference between the simple and complex approach. 

Acknowledging that we don’t know the pathways to extinction actually opens up new approaches to combatting risk. We may see reducing systemic vulnerability as more impactful than under the simple approach.  see reducing the probability of feedbacks and of passing thresholds beyond which we may reasonably assume catastrophe may follow as appropriate courses of action. Or, even if we are unsure about what exactly will kill us, we might want to focus on what is driving risk in general rather than specific hazards, be it work on "agents of doom" or Bostrom's vulnerable work emerging out of a semi anarchic default condition. Whilst the complex approach acknowledges the difficulties that the nonlinearities and complexities bring, in other ways it allows for a broader repertoire of responses to risk as well, as  Cotton Barrett et als work on defence in depth also shows, for example.

Another approach to  complexity may be what might be called the “Planetary Boundaries” approach. Here, we identify thresholds whereby we know the system is safe, and try to avoid crossing into the unknown. Its like we’re at the edge of a dark forest; it may be safe to walk in, but better safe than sorry. It applies a precautionary principle; that in such a complex system, we should have the epistemic humility to simply say “better the devil you know.”  This approach has rightfully been critiqued by many who tend to favour a more “simple” approach; it is very handwavy, with no clear mechanism to extinction or even collapse, with the boundaries chosen somewhat arbitrarily. Nevertheless, it may argued that lines had to be drawn somewhere, and wherever they would be drawn would be arbitrary; so this is a “play it safe” approach because we don’t know what is beyond these points rather than an “avoid knowable catastrophe approach.” However, such an approach is very problematic if we want to prioritise between approaches, something I will briefly discuss later.  

Something similar could be said as a solution to Bostrom’s “Vulnerable World” and Manheim’s “Fragile World.” If increasing technological development and complexity puts us in danger, then maybe we should take every effort to stop this; after all, these things are not inevitable. Of course, Bostrom would never accept this- to him this alone poses an X-Risk- and instead proposes a global surveillance state, but that is slightly besides the point.  

However, we are still  faced with a number of problems. We are constantly moving into unprecedented territory.  And sometimes, we are not left with an option which is nice and without tradeoffs. MacAskill somewhat successfully argues that technological stagnation would still leave us at danger of many threats.  Sometimes we have already gone in the forest, and we can hear howling, and we have no idea what is going on, and we are posed with a choice of things to do, but no option is safe. We are stuck between a rock and a hard place. Under such deep uncertainty, how can we act if we refuse to reduce the complexity of the world? We can’t just play it safe, because every option fails a precautionary principle. What do we do in such cases?

This is the exact dilemma that faces me in my research. I’m researching the interactions of solar radiation modification and existential risk, both how it increases and decreases risk. As it is therefore simultaneously combatting a source of risk, and itself increases risk, the sort of “play it safe” approach to complexity just doesn’t necessarily work, although before I properly explain how I am attempting to unpick this, I ought to explain exactly what I’m on about.

Solar Radiation Modification (SRM), otherwise known as solar geoengineering is a set of technologies that aim to reflect a small amount of sunlight to reduce warming. Sunlight enters the earth, some is reflected. That which isn’t is absorbed by the earth, which is then reemitted as long wave infrared radiation. Some of this escapes to space, and some gets absorbed by greenhouse gases in the atmosphere, warming it. As we increase GHG concentrations, we increase the warming. SRM tries to reduce this warming by decreasing the amount of light entering the earth, by reflecting it by either injecting aerosols into the stratosphere, mimicing the natural effects of volcanos, or by brightening clouds, or a related technique that isn’t quite the same that involves thinning other clouds. This would likely reduce temperatures globally, and the climate would be generally closer to preindustrial, but it comes with its own risks that may make it more dangerous

Those working from a simple paradigm have tended to reject risks from climate change as especially large. Toby Ord estimates the risk at 0.1%. Will MacAskill in what we owe the future suggests “its hard to see how even [7-10 degrees of warming] could cause collapse.” Both of these have tended to use proxies for what would cause collapse, trying their best to come up with simple, linear models of catastrophe; Toby Ord wants to look at whether heat stress will cause the world to become uninhabitable, and Will wants to look at whether global agriculture will entirely collapse. These simple proxies, whilst making it easier to reason simple causal chains, are just not demonstrative of how risk manifests. Some have then attempted to argue whether climate change poses a first order indirect existential risk, which is mostly John Halstead’s approach in his climate report, but once again, I think this misses the point. 

From a more complex paradigm, I think climate change becomes something to be taken more seriously, because not only does it make hazards more likely, and stunts our responses, but also, and perhaps more keenly, makes us more vulnerable, and may act to majorly compound risk in ways that make catastrophe far more likely. A variety of these scenarios where a “one hazard to kill us all” approach doesn’t work was explored in the recent “Climate Endgame” paper. One area where that paper strongly disagrees with the status quo is via “systemic risk.” In the Precipice, Ord argues that a single risk is more likely than two or more occurring in unison, however, Climate Endgame explores how climate change has the ability to trigger widespread, synchronous, systemic failure via multiple indirect stressors: food system failures, economic damage, water insecurity etc coalescing and reinforcing until you get system wide failure. A similar, but slightly different risk, is that of a cascade, with vulnerabilities increasing until one failure sets off another, and another, with the whole system snowballing; in the case of climate, this may not just refer to our socio-economic system, but evidence of tipping cascades in the physical system show that there is a non-negligable chance of major near synchronous collapse of major elements in the earth system. Such spread of risk is well documented in the literature, as occurred in the 2008 financial crisis, but has been almost entirely neglected by the simple paradigm of existential risk.  The ability for such reinforcing, systemic risk to occur from initial hazards that are far smaller than the simple paradigm would consider “catastrophic” should really worry us: normally, lower magnitude hazards are more common, and we are likely severely neglecting these. If one takes such systemic failures seriously, climate change suddenly looks a lot more dangerous than the simple approach lets it be.

So, a technology like SRM that can reduce climate damage may seriously reduce the risk of catastrophe. There is a significant amount of evidence to suggest that SRM moderates climate impacts at relatively “median” levels of warming. However, one thing that has hardly been explored is the capacity of SRM to combat hitting those earth tipping thresholds, which, whilst not essential to have the spreading systemic risk, is certainly one key contributor to existential risk from climate change being higher. So, alongside some colleagues at Utrecht and Exeter, we are starting to investigate the literature, models and expert elicitations to try and make a start at understanding this question. So, this is one way one can deal with complexity: try and make a start with things which we know contribute to systemic risk in ways that could plausibly be catastrophic, and observe whether these can be reduced.

However, SRM also acts as a contributor to risk. In one sense, this contributor to risk is easier to understand from the simple paradigm, as it is its direct contribution to great power conflict, which is often itself considered a first order indirect risk. So here we can perhaps agree! This has been explored in many peoples work, some just simple, two variable analyses of the interaction of SRM and volcanic hazards, whilst some try and highlight how SRM may change geopolitics and tensions in a way which may change how other risk spreads and compounds. One key way it does this is by coupling our geopolitical-socio-political system with the ecological-climatic system, allowing for risk to spread from our human system to the climatic system that supports us much faster than before. This might really worry us, given how our climatic system then feeds back into our human system and so on. 

A second manner which it contributes is by the so-called latent risk- a risk that lays “dormant” until activated. Here, if you stop carrying out SRM, you get rapid warming, what is often called “termination shock”, and faster rates of warming likely raise risk through all the pathways discussed for climate change. However, to add another wrinkle, such termination is mostly plausible because of another global catastrophe, so what would occur is what Seth Baum calls  a “Double Catastrophe”- again highlighting how synchronous failure might be more likely than single failure! However, to get a better understanding of the physical effects of such double catastrophe under different conditions, I have been exploring how SRM would interact with another catastrophe that had climatic effects, namely a catastrophe involving the injection of soot into the stratosphere after a nuclear exchange. Here, its very unclear that the “termination shock” and the other effects of SRM actually make the impacts of such an exchange worse, and it is likely that actually it acts to slightly moderate the effects. I think this shows we cannot simply go “interacting hazards and complex risk = definitely worse,” but I also think it shows that the neglect of such complex risk by the simple approach loses a hell of a lot of the picture.

The other thing I am trying to explore is the plausible cascades and spread mechanisms of risk which SRM encourages. In part, I am doing this through foresight exercises like ParEvo, where experts are brought together to generate collaborative and creative storylines of diverging futures. Unlike forecasting, such scenarios don’t have probabilities on them; in fact, due to the specificity needed, a good scenario should have a probability zero, like a point on a probability distribution, but hopefully can give us a little bit of a map with what could occur. So we highlight a whole load of plausible scenarios, acknowledging that none of these are likely to come to fruition, but hopefully on the premise that these should perhaps highlight some of the key areas in which good action should focus. For example, my scenarios will be focusing on different SRM governance schemes response to different catastrophic shocks, so hopefully highlighting common failures of governance systems to more heavy tailed shocks. Scenarios are useful in many other areas, such as the use of more “game-like” scenarios such as Intelligence Rising to highlight the interactions of the development of AGI and international tensions and geopolitics. 

Nonetheless, ultimately what is needed is to do a risk-risk-risk-risk-risk analysis, comparing the ways SRM reduces and contributes to risk, and what leverage could be to reduce each of those contributors. This is a way off, and I am unsure if we have good methodologies for this yet. Nonetheless, by acknowledging the large complexities, and utilising methods to uncover how SRM can contribute to risk and reduce risk in the global interconnected system, we get a far better picture of the risk landscape than under the simple approach. Many who take the simple approach have been quite happy to reject SRM as a risky technology without major benefit in mitigating X-Risk, and have been happy to do a “quick and dirty” risk-risk analysis based on simple models of how risk propagate. As we explore the more complex feedbacks, interactions and cascades of risk, the validity of such simple analyses is, I think, brought into question, highlighting the need for the complex paradigm in this field. 

Finally, its important to note that it is not obvious how any given upstream action, like research for example, contributes to risk. Even doing research helps to rearrange the risk landscape in unpredictable ways, and so even answering of whether research makes deployment of the technology more likely is really hard, as research spurs governance, impacts tensions, impacts our ability to discover different technologies and make the system more resilient etc. Once again, the complex web of impacts of research also need to be untangled. This is tricky, and needs to be done very carefully. But given EA dominates the X-Risk space, its not something we can shirk doing. 

I also think its important to note that these approaches, whilst working often from different intellectual assumptions, have their differences manifest predominantly culturally rather than intellectually. In many ways, the two approaches converge; this is perhaps surprising, and maybe acts as a caution against my grandsstanding of fundamental axiomatic differences. For example, the worry about an intelligence explosion is at its core, I think, a worry about a threshold beyond which we move into a dangerous unknown, with systems smarter than us who may, likely by some unknown mechanism, kill us. In many ways, it should be more comfortable inside the “complex” paradigm, without well thought out kill mechanism, acknowledging irreducable undetermination of the future, and how powerful technologies, phenomena and structures within our complex interconnected system are likely to contribute hugely to risk, than in the simple paradigm. Similarly, work that thinkers like Luke Kemp, who ostensibly aligns more with the “complex paradigm” have done on the agents of doom, which tries to identify the key actors that are drivers of risk, which is mostly the drivers of hazards, probably fits more neatly in the “simple” paradigm than the complex paradigm. I think these cultural splits are important as well, and it probably implies that a lot of us from across the spectrum are missing potentially important contributors to existential risk, irrespective of our paradigm.

As a coda to this talk, I would like to briefly summarise Adrian Currie’s arguments in his wonderful paper “Existential Risk, Creativity and Well Adapted Science.” This is relevant perhaps a level above what I am talking about, arguing about what “meta-paradigm” we should take. He suggests that all research has a topographical landscape with “peaks” representing important findings. Thus, research is a trade-off between exploring and exploiting this landscape. I think the simple approach is very good at exploiting certain peaks, but is particularly bad at understanding the topography of the whole landscape, which I think the complexity paradigm is much better at. But as Currie convincingly argues, this probably isn’t sufficient. X-Risk studies is in a relatively novel epistemic situation: the risks they deal with are unique, in the words of Carl Sagan “not readily amenable to experimental verification… at least not more than once.” The systems are wild and thus don’t favour systemic understanding. We are not just uncertain as to the answer to key questions, but also uncertain as to what to ask. It is a crisis field, centred around a goal rather than a discipline- in fact, we are uncertain what disciplines matter the most. All of this leaves us in an epistemic situation where in many ways uncertainty, and thus creativity, should be at the core of our approach, both trying to get an understanding of the topography of the landscape and because it stops us getting siloed. On the spectrum of exploring vs exploiting, exploratory approaches should be favoured, because we should reasonably see ourselves as deeply uncertain about nearly everything in X-Risk. Even if people haven’t managed to “change our minds” on a contributor to risk, experience should tell us that we are likely to be wrong in ways that no one yet understands, and there are quite probably even bigger peaks out there. We also should be methodological omnivores, happy to use many methodologies and tailoring these to local contexts, and with a pluralistic approach to techniques and evidence, increasing the epistemic tools at our disposal. Both of these imply the need for pluralism, rather than hegemony of any one approach. I am very worried that EAs culture and financial resources are pushing us away from creativity and towards conservatism in the X-RIsk space. 

In conclusion, this talk hasn’t shown you the simple approach is wrong, just that it provides a thoroughly incomplete picture of the world that is insufficient at dealing with the complexity of many drivers of existential risk. This is why I, and many others, call for greater methodological diversity and pluralism, including a concerted effort to come up with better approaches to complex and systemic risk. The simple approach is clearly problematic, but it is far easier to make progress on problems using it- its like the Newtonian physics of existential risk studies. But to get a more complete picture of the field, we need a more complex approach.  Anders Sandberg put this nicely, seeing the “risk network” as having a deeply interconnected core, where the approach of irreducable complexity must dominate, a periphery which has less connections, where a compounding risk approach can dominate, and a far off periphery where the simple, hazard centric approaches dominate with relatively few connections between hazards. The question is, and one that is probably axiomatic more than anything else, where the greatest source of risk is. But both methods of analysis clearly have their place. 

As EA dominates the existential risk field, it is our responsibility to promote pluralism, through our discourses and our funding. Note, as a final thing, this isn’t the same as openness to criticism, based on “change my mind” around a set of rules which a narrow range of funders and community leaders set. Rather, we need pluralism, where ideas around existential risk coexist rather than compete, encouraging exploration and creativity, and, in the terms of Adrian Currie, “methodological omnivory.” There is so little evidentiary feedback loops that a lot of our answers to methodological or quasi-descriptive questions tends to be based on our prior assumptions, as there often isn’t enough evidence to hugely shift these, or evidence can be explained in multiple ways. This means our values and assumptions hugely impact everything, so having a very small group of thinkers and funders dominate and dictate the direction of the field is dangerous, essentially no matter how intelligent and rational we think they are. So we need to be willing not just to tolerate but fund and promote work into X-Risk that we individually may think is a dead-end, and cede power to increase the creativity possible in the field,  because in such uncertainty, promoting creativity and diversity is the correct approach, as hard as it is to accept this. How we square this circle with our ethos of prioritisation and effectiveness is a very difficult question, one I don’t have the answer to. Maybe its not possible; but EAs seem to be very good at expanding the definition of what is possible in combatting the world’s biggest problems. This a question that we must pose, or we risk trillions of future lives. Thank you. 



Sorted by Click to highlight new comments since:

Thanks so much for posting this Gideon. I like your way of framing this into these two loose clusters, and especially your claim that it is good to have both. I completely agree. While my work is indeed more within the simple cluster, I feel that a fight over which approach is right would be misguided.

All phenomena can be modelled at lesser or greater degrees of precision, with different advantages and disadvantages of each. Often there are some sweet spots where there is an especially good tradeoff between accuracy and ability to actually use the model. We should try to find those and use them all to illuminate the issue.

There is a lot to be said  for simple and for complex approaches. In general, my way forward with all kinds of topics is to start as simple as possible and only add complexity when it is clearly needed to address a glaring fault. We all know the truth is as complex as the universe, so the question is not whether the more complex model is more accurate, but whether it adds sufficient accuracy to justify the problems it introduces, such as reduced usability, reduced clarity,  and overfitting. Sometimes it clearly is. Other times I don't see that it is and am happy to wait for those who favour the complex model to point to important results it produces.

One virtue of a simple model that I think is often overlooked is its ability to produce crisp insights that, once found, can be clearly explained and communicated to others. This makes knowledge sharing easier and makes it easier to build up a field's understanding from these crisp insights. I think the kind of understanding you gain from more complex models is often more a form of improving your intuitions and is harder to communicate, and doesn't typically come with a simple explanation that the other person can check to see if you are right without spending a similar amount of time with the model.

I really appreciate the explorative, curious, open and constructive approach Toby!

On 'what are some important results that a complex model produces', one nice example is a focus on vulnerability. That is, focus on improving general resilience, as well as preventing and mitigating particular hazards. This has apparently become best practice in many companies - e.g. rather than just listing hazards, focus also on having adequate capital reserves and some slack/redundancy in one's supply chains.

Matt Boyd and Nick Wilson have done some great complex-model-ish work looking at the resilience of island nations to a range of scenarios. One thing that turned up is that Aotearoa New Zealand has lots of food production, but transport of that food is reliant on road transport, and the country closed its only oil refinery. Having an oil refinery might increase its resilience/decrease its vulnerability.

I don't think that point would have necessarily come up in a 'simple-model' approach, but its concrete, tractable, important and plausibly a good thing to suggest the govt act on.

Of course, you touch on vulnerabilities in the Precipice. Nevertheless, its fun to wonder what a sequel would look like with each chapter framed around a critical system/vulnerability (food, health, communications) rather than each around a particular hazard.


Thanks for this Gideon. Having read this and your comments on my climate report, I am still not completely sure what the crux of the disagreement is between us. I get that you disagree with my risk estimates, but I don't really understand why. Perhaps we could discuss on here, if you were up for it

I obviously think we need more time to flesh out real cruxes but I think our differences are cruxes are probably a few fold:

  • I think I am considerably less confident than you in the capacity of the research we have done thus far  to confidently suggest climate's contribute to existential risk. To some degree,  I think the sort of evidence your happier relying on to make negative claims (ie not a major contributor to existential risk) I am much less happy with doing, as I think they often (and maybe always will) fail to account for plausible major contributors to the complexity of a system. This is both an advantage of the simple approach as Toby lays out earlier, but I'm more skeptical at its usage to make negative rather than positive claims.
  • I think you are looking for much better thought out pathways to catastrophe than I think is appropriate. I see climate acting as something acting to promote serious instability in a large number of aspects of a complex system, which should give us serious reasons to worry. This probably means my priors on climate are higher than yours immediately, as I'm of the impression you don't hold this "risk emerges from an inherently interconnected world" ontology. This is why I've often put our differences down to our ontology and how we view risk in the real world
  • Because of my ontology and epistemology, I think I'm happier to put more credence on things like past precedent (collapses trigger by climate change, mass extinctions etc.), and decently formulated theory (planetary boundaries for GCR (although I recognise their real inherent flaws!), the sort of stuff laid out in Avin et al 2018, whats laid out in Beard et al 2021 and Kemp et al 2022).  I'm also happier to take on board a broader range of evidence, and look more at things like how risk spreads, vulnerabilities/exposures,  feedbacks, responses (and the plausible negatives therin) etc, which I don't find your report convincing deals with, partially because they are really hard to deal with and partially because, particularly for the heavy tails of warming and other factors, there is a very small amount of research as Kemp et al lays out. Correct me if I'm wrong, but you see the world as a bit more understandable than I do, so simpler, quantitative, more rational models are seen as more important to be able to make any positive epistemic claim, and so you would somewhat reject the sort of analysis that I'm citing. 
  • I'm also exceptionally skeptical of your claim that if direct risks are lower than indirect risks are lower; although I would reject the use of that language full stop

I also think its important to note that I make these claims in (mostly)  the context of X-Risk. I think in "normal" scenarios, I would fall much closer to you than to disagreeing with you on a lot of things. But I think I have both a different ontology of existential risk (emerging mostly out of complex systems, so more like whats laid out in Beard et al 2021 and Kemp et al 2022) and perhaps more importantly a  more pessimistic epistemology. As (partially) laid out when I discuss Existential Risk, Creativity and Well Adapted Science in the talk, I think that with Existential Risk negative statements (this won't do this) actually have a  higher evidentiary burden than positive statements of a certain flavour (it is plausible that this could happen). Perhaps this is because my priors of existential risk from most things are pretty low (owing I think in part to my pessimistic epistemology) that it just does take much more evidence to cause me to update downwards than to be like "huh, this could be a contributor to risk actually!"

Does this answer our cruxes? I know this doesn't go into object level aspects of your report, but I think this may do a better job at explaining why we disagree, even when I do think your analysis is top-notch, albeit with a methodology that I disagree with on existential risk.

I also think its important that you know that I'm still not quite sure if I'm using the right language to explain myself here, and that my answer here is why I find your analysis unconvincing, rather than it being wrong. Perhaps as my views evolve I will look back and think differently. Anyway, I really would like to talk to you more about this at some point in the future.

Does this sound right to you?


Thanks yes that is helpful. Perhaps we can now get into the substance. 

  • It is noteworthy how different your estimates of the x-risk of climate change are to all other published attempts to quantify the aggregate costs of climate change. All climate-economy models imply not just that climate change won't cause an existential catastrophe, but that average living standards will be higher in the future despite climate change. When people try to actually quantify and add up the effect on things like agriculture, sea level rise and so on, they don't get anywhere near to civilisational collapse, but instead get a counterfactual reduction in GDP on the order of 1-5% relative to a world with no climate change (not relative to today). 
  • I don't think past precedent can take us very far here, since there are no precedents of climate change causing human extinction, though anthropics is obviously an issue here. In the report, I also discuss how in the last 160 million years, climate change has not been associated with elevated rates of species loss. Humans also survive and thrive in very diverse environmental niches at the moment, with an annual average temperature of 10ºC in the UK, but closer to 25ºC in South Asia. Within this annual average, there is also substantial diurnal and seasonal variation. It's around 5ºC in the UK now but will reach 20ºC in the summer. Humans have survived dramatic climate change over the last 300,000 years, and our hominid ancestors also survived when the world was about 4ºC warmer. It's hard to see why climate change of 2-4ºC would make such a massive difference, so as to constitute an existential catastrophe
  • I disagree about planetary boundaries for reasons I discuss in the report. I have examined several of the boundaries in depth and they just seem to be completely made up. 
  • It is not true that there is a small amount of research on the tails of warming. Business as usual is now agreed to be 2.5ºC with something like a 1-5% chance of 4ºC. The impacts literature has in fact been heavily criticised for focusing too much on the impacts of RCP8.5, which implies 5ºC by 2100. 
  • The approach that you advocate for seems to me to establish not just that climate change is a much bigger risk than commonly recognised but also that many other problems are as well. Other problems also have similar or larger effects to climate change when calculated in the usual way used in economic analysis. This includes things like mispricing of water, immigration restrictions, antimicrobial resistance, underinvestment in vaccines, a lot of things that affect the media, the prohibition of GM food, underinvestment in R&D, bad monetary policy, economists focusing on RCTs, housing regulation, the drug war etc. If climate change is a cascading risk on the order of 0.01pp to 1pp, then these problems should be as well. But if they are as well, then total existential risk from non-AI and non-bio sources is way way higher than commonly recognised and doom is almost certain. The reasoning suggests that the world is so fragile that it is unlikely that we could even have got to the current level of technological development. 
  • I would view a lot of my report as assessing cascading risk. I discuss pathways such as climate change => civil conflict => political instability => interstate war. I also discuss effects on migration and the spillover effects this might have. What difference would a cascading risk approach take here? Related to this, I don't view causal chains like this as very understandable and I say so in the report. But we still have ideas about how big effects some things have. The causes of war between the US and China or Russia and China 

To answer each of your points in turn

  • I think its important to note that much of the literature looking at those estimates for extreme scenarios (not just extreme levels of warming, but other facets of the extremes as well), has suggested that current techniques for calculating climate damage aren't great at the extremes, and tend to function well only when close to status quo. So we should expect that these models don't act appropriately under the conditions we are interested in when exploring GCR/X-Risk. This has pretty commonly been discussed in the literature on these things (Beard et al 2021, Kemp et al 2022, Wagner &Weitzmann 2015, Weaver et al 2010 etc.)
  • I still think past events can give us useful information. Firstly, climate change has been a contributing factor to A LOT of societal collapses; whilst these aren't perfect analagies and do show a tremendous capacity of humanity to adapt and survive, they do show the capacity of climate change to contribute to major socio-political-technological crises, which may act as a useful proxy for what we are trying to look for. Moreover, whilst a collapse isn't an extinction, if we care about existential risk, we might indeed be pretty worried about collapse if it makes certain lock-in more or less likely, but to be honest thats a discussion for another time. Moreover, whilst I think your paleoclimatic argument is somewhat reasonable, given the limited data here (and your reliance on a few data points + a large reliance on a single study of plant diversity (which is fine by the way, we have limited data in general!)), I don't find it hugely comforting. Particularly because climate change seems to have been a major factor in all of the big 5 mass extinction events, and the trends that Song et al 2021 note in their analysis of temperature change and mass extinction over the Phraneozoic. They mostly use marine animals. When dealing with pass processes, explainations are obviously difficult to disentangle, so there are reasons to be sceptical of the causal explanatory power of Song's analysis, although obvious such similar uncertainty should be applied to your analysis, particularly with the claims of this fundamental step change 145 million years ago. 
  • Whilst planetary boundaries do have their flaws and to some degree where they are set is quasi-arbitary, as discussed in the talk, something like this may be necessary when acting under such deep uncertainty; don't walk out into the dark forest and all that. Moreover, I think your report fails to argue convincingly against the BRIHN framework that Baum et al 2014 developed, in part in response to the Nordhaus criticisms which you cite. 
  • Extreme climate change is not just RCP 8.5/ SSP5-8.5, its much broader than that. Kemp et al 2022's response to Burgess et al's comment lays out this argument decently well, as does Climate Endgame itself. 
  • I don't really understand this point, particularly in response to my talk. I explicitly suggest in my talk I think systemic risk, which those could all contribute to, are very important. The call for more complex risk assessment (the core point of the talk alongside a call for pluralism) is that there are likely significant limits to conventional economic analysis in analysing complex risk. The disagreement on this entire point seems to be explained reasonably well by the difference between the simple/complex approach. 
  • I think your causal pathways are too simple and defined (ie they are those 1st and 2nd order indirect impacts), and probably don't account for the ways in which climate could contribute to cascading risk. Whilst of course this is still under explored, some of the concepts in Beard et al 2021 and Richards et al 2021 are a useful starting place, and I don't really see how your report refutes the concepts around cascades they bring up. I'd also like to agree these cascades are really hard to understand, but I struggle to see how that fact acts in the favour of your approach and conclusions?

I hope this has helped show some of our disagreements! :-) 

  • I agree that climate-economy models aren't good at some types of extremes, but I think there are different versions of this argument, some of which have become weaker over the years. One of Weitzman's points was that there was a decidedly non-negligible chance of more than 6ºC and our economic models weren't good at capturing how bad this would be and so tended to underestimate climate risk. I think this was basically right at the time he was writing. But since 5ºC now looks less and less likely,  this critique has less and less bite. Because there is such a huge literature on the impact of 5ºC, the models now in principle have a much firmer foundation for damage estimates. eg the Takakura 2019 paper that I go on about in the report uses up to date literature on a wide range of impact channels, but still only gets like a 5% counterfactual reduction in welfare-equivalent of GDP by 2100, and so probably higher average living standards than today. 
    • Another version of this is that the models aren't good at capturing tipping points. I agree with this, but I also find it difficult to see how this would make a dramatic difference to the damage estimates if you actually drill down into the literature on the impact of different tipping points. Tipping points that might cause different levels of warming are not relevant to damage estimates, so the main ones that seem relevant are ice sheet collapse, regional precipitation and temperature changes, such as changes in monsoons, which might be caused eg by collapse of the AMOC. For the impacts discussed in the literature, it is difficult to see how you get anywhere close to an existential catastrophe if any of these things happen. 
    • Aside from that, it is noteworthy that some economic models actually try to capture the literature on the  impact of warming of 5ºC on things like agriculture, sea level rise, temperature-related deaths, lost productivity from heat etc. There is a group of scientists who say that 3ºC/4ºC is catastrophic on the basis of what the scientific literature says about these impacts. The models strongly suggest that they are wrong, and it is not clear what their response is.
    • All this being said, I am sympathetic to some critiques of the economic models, eg a lot of the Nordhaus stuff. When I was writing the report, I had thought about putting no weight on them at all, but after digging a bit I changed my mind. I think some of the models make a decent stab at quantifying aggregate costs. 
  • I agree that climate changes have contributed at least to some civilisational trauma throughout history. The literature on this suggests that climate change has been correlated with local civilisational trauma. But: (a) local collapse is a far cry from global collapse; (b)  most of the time this was due to cooling rather than warming; (c) the mechanism was usually damage to agricultural output, but there is now far more slack in the system, and we have massively better technology to deal with any disruption; (d) we in general have far more advanced technology, and whereas in the past >90% of the workforce would have been employed in agriculture, now <20% is (or whatever); (e) the relationship between climate change and civilisational turmoil breaks down by the industrial revolution, which provides some support for point (c). 
  • The paleoclimate point doesn't rely on one datapoint: it's data from 160 million years of climatic and evolutionary history. Massive climate change over that period didn't cause species extinctions, as some might have expect it to have done. 
    • As you say, with climate change, the extinctions usually happened among marine life, due to ocean anoxia and  ocean acidification, and it's hard to see the mechanism by which CO2 pollution would cause land-based extinctions, unless something else weird happens at the time, such as a volcanic eruption puncturing though salt deposits as happened at the Permian. 
    • For the level of warming that now looks likely of 2-4ºC, it's really hard to see why it would cause similar  damage eg to the Permian, given that the effect is an order of magnitude smaller. 
  • I don't think they are quasi-arbitrary, they are totally arbitrary. eg they propose a planetary boundary for biodiversity intactness which by their own admission is made up. The boundary also can't be real since various countries across Eurasia  completely destroyed their pre-modern ecosystems after the agricultural revolution without causing anything like civilisational collapse. 
    • A lot of people criticise planetary boundaries for being political advocacy. The clearest evidence for this is Steffen et al proposing a supposed planetary boundary for a 'Hothouse earth' at 2ºC (which happens to be the Paris target) on the basis of no argument.
    • When we are acting under uncertainty I think we should use expected value. Alleged boundaries might be a useful schelling point for political negotiation (like the 2ºC threshold), but it's not a good approach for actually quantifying risk. Another downside of a boundary is that it implies that anything we do once we pass the boundary is pointless. 
  • Kemp, Jehn and others claim that the effect of warming of more than 3ºC is 'severely neglected'. But all of the impacts literature explores the effect of rcp8.5 by 2100, which implies 4-5ºC of warming. Jehn's search strategy uses temperature mentions to measure neglect, but if you use RCP mentions, you don't get the same result. 
  • My argument here was that I think your argument proves too much - it suggests that the world is extremely fragile to eg agricultural disruption and heat waves that happen all the time. Given that the world was eg a lot poorer in 1980 and so had a lot lower adaptive capacity, why didn't various weather disasters trigger cascading catastrophes back then? The number of people dying in weather-related disasters has declined massively over time, so we should expect the cascade to have happened in the 1920s and less so in the future?
    •  I also don't see why cascading risk would change the cause ranking among top causes. Why aren't democratised bioweapons and AI also cascading risks? 
  • What are the causal pathways that might contribute to conflict risk that you think I have missed? I don't really get what is meant to happen that I haven't already discussed. I talk about all of the contributors to war outlined in textbooks about war and combine that with the literature on climate impacts. It is just really a stretch to make it an important contributor to US-China dynamics. 

Hi John, sorry this has taken a while. 

  • In particular, climate economy models still do bad at the heavy tail, not just of warming, but at civilisational vulnerability etc, again presenting a pretty "middle of the road" rather than heavy tailed distribution. The sort of work from Beard et al 2021 for instance highlights something I think the models pretty profoundly miss. Similarly, I'd be really interested in research similar to Mani et al 2021 on extreme weather events and how this may change due to climate change.
  • I dpon't see why the models discount the idea that there is a low but non-negligable probability of catastrophic consequences from 3-4 degrees of warming. What aspect of the models? I'm reticent to rely on things like damage functions here, as they don't seem to engage with the possib;le heavy-tailedness of damage. Whilst I agree that the models probably are decent approximations of reality, I'm just not really very sure they are useful at telling us anything about the low probabil;ity high impact scenarios that we are worried about here.
  • Whilst I agree there are reasons to think our vulnerability is less, there is clear reasons to think with a growing interconnected (and potentially fragile) global network and economy, our vulnerability is increasing, meaning that whilst the past collapse data might not be prophetic, there is at least value in it; after all, we are in a very evidence poor environment, meaning that I would be reticent to dismiss it as strongly as you seem to. And whilst it is true our agricultural system is more resilient, there is still a possibility of multiple breadbasket failures etc caused by climate change, and the beard et al and richards et al both explore plausible pathways to this. Again, whilst the past collapse data is definitely not a slam dunk in my favour, I would at least argue it is an update nonetheless. I think you might argue the fact that none led to human extinction makes that data an update in yopur direction, and i think your view on this depends on whether you see collapse and GCR and extinction on a continuum or not; I broadly do, and I assume you broadly don't?
  • When I said one data point, I meant really one study. The reason I say this, is as cited, studies of different species/ species groups. In your comment, you don't seem to engage with Song et al 2021.  Kaiho at al 2022 also shows a positive relationship between warming and extinction rate. Moreover, I think it takes an overly confident view of our understanding of kill mechanisms, and seems to suggest that just because we don't have all what you speculate were the important factors that were present in past mass extinctions doesn't make that not useful evidence.  I think a position like Keller et al 2018 (PETM as the best case, KPg as the worst case) is probably useful at looking at this (only using modern evidence!). Once again, this is an attempt by me, in a low evidence situation, to make best use of the evidence available, and I don't find your points compelling enough to make me not think that this past precident can't be informative. 
  • On the Planetary Boundaries, you don't seem to be engaging with what I'm saying here, which is most alluding to the Baum et al paper on this. Moreover, even if you think we are to use EV, what are you basing the probabilities on? I assume some sort of subjective bayesianism, in which case you'll have to tell me why I should put a decently high (>1%) prior on moving beyond certain Holocene boundaries posing a genuine threat to humanity? That seems perfectly reasonable to me
  • I'm not really sure I understand the argument? Whilst in some ways the world has indeed got less vulnerable, in other ways it has got more connected, more economically vulnerable to natural disasters etc. Cascading impact seems to be seen more along these lines than along others. Moreover, if you only had a 5% probability of such a cascade occuring over a century, and we have hardly had a hyper-globalised economy for even that long, why would you expect it to have happened already? Your statements here seem pretty out of step with my actual probabilities etc.. And as I talk about in my talk, I also see problems from AI, biorisk and a whole host more. Thats why this talk, and this approach, is seriously not just about climate change; the hope is to add another approach to studying X-Risk.
  • I'm also pretty interested in your approach to evidence on X-Risk. I should say from the outset that I think climate change is unlikely to cause a catastrophe, but I don't think you have provided compelling evidence that the probability is exceptionally small. Your evidence often seems to rely on the very things that we think ought to be suspect in X-Risk scenarios (economic models, continued improved resilience, best case scenario analogies etc.), and you seem to reject some things that might be useful for reasoning in such evidence poor environments (plausibly useful but somewhat flawed historical analogies, foresight, storytelling, scenarios etc.) .  Basically, you seem to have a pretty high bar for evidence to be worried about climate change, which whilst I in general think is useful, I'm just not sure how appropriate it is in such an evidence poor environment as X-Risk, including climate change contributions to it. Its pretty interesting that you seem very willing to rely on much more speculative evidence for AI and biorisk (eg probabilistic forecasts which don't have track records of being able to work well over such long time scales), and I genuinely wonder why this is. Note that such more speculative approaches (in this case superforecasters) gave a 1% probability of climate change being a necessary but not sufficent cause of human extinction by 2100, and gave an even higher probability to global catastrophe by 2100, which certainly then has the probability of later leading to extinction. Whilst I myself am somewhat sceptical of such approaches, I'd be interested in seeing why you seem accepting of them for bio and AI but not climate? Is it because you see evaluation of the existential risk from climate change as a much more evidence rich environment than for bio/AI?
  • I'm not sure they're middle of the road on civilisational vulnerability. It would be pretty surprising if extreme weather events made a big difference to the overall picture. For the kinds of extreme weather events one sees in the literature, it's just not a big influence on global GDP. How bad would a hurricane or flood have to be to push things from 'counterfactual GDP reduction of 5%' to civilisational collapse. 
  • I don't  think they fully discount/ignore the possibility of catastrophe 3/4ºC. In part this is just an outcome of the models and of the scientific literature. There are no impacts that come close to catastrophe in the scientific literature for 3/4ºC. I agree they miss some tipping points, but looking at the scientific literature on that, it's hard to see how it would make a big difference to the overall picture. 
  • I haven't read those papers and don't have time to do so now unfortunately. My argument there doesn't rely on one study but on a range of studies in the literature for different warm periods. The Permian was a very extreme and unusual case because it caused such massive land-based extinctions, which was caused by the release of halogens, which is not relevant to future climate change. Also, both the Permian and PETM were extremely hot relative to what we now seem to be in for (17ºC vs 2.5ºC). 
  • I'm not sure I see how I am not engaging with you on planetary boundaries. I thought we were disagreeing about whether to put weight on planetary boundaries, and I was arguing that the boundaries just seem made up. Using EV may have its own problems but that doesn't make planetary boundaries valid. 
  • I don't really see how the world now is more vulnerable to any form of weather events in any respect than it has been at any other point in human history. Society routinely absorbs large bad weather events; they don't even cause local civilisational collapse any more (in middle and high income countries). Deaths from weather disasters have declined dramatically over the last 100 or so years, which is pretty strong evidence that societal resilience is increasing not decreasing.  In the pre-industrial period, all countries suffered turmoil and hunger due to cold and droughts. This doesn't happen any more in countries that are sufficiently wealthy. Many countries now suffer drought, almost entirely due to implicit subsidies for agricultural water consumption. It is very hard to see how this could lead to eg to collapse in California or Spain. 
  • Can you set out an example of a cascading causal process that would lead to a catastrophe? 
  • I'm not sure that there is some meta-level epistemic disagreement, I think we just disagree about what the evidence says about the impacts of climate change.  In 2016, I was much more worried than the average FHI person about climate change, but after looking at the impacts literature and recent changes in likely emissions, I updated towards climate change being a relatively minor risk. Comparing to bio for instance, after reading about trends in gene synthesis technologies and costs, it takes about 30 minutes to see how it poses a major global catastrophic risk in the coming decades. I've been researching climate change  for six years and struggle to see it. I am not being facetious here, this is my honest take.

Thanks for this it is useful. What is your estimate of the existential risk due to climate change? I obviously have it very low, so it would be useful to know where you are at on that.  Could you explain what the main drivers of the risk are, from your point of view? Then we can get into the substance a bit more

I suppose the problem with that question from my perspective is I don't think "existential risk due to X" really exists, as I explain in the talk. The number of percentage points it raises overall risk by, I would put climate change between <0.01% and 2%, and I would probably put overall risk at between 0.01% to 10% or something. But I'm not sure that I actually have much confidence in many approaches to xrisk quantification (as per Beard et al 2020a), even if it does make quantification easier. Some of the main contributions to risk from climate, but note a number may also be unknown or unidentifiable:

  • Weakening local, regional and global governance -Water and food insecurity -Cascading economic impacts -Conflict -Displacement -Biosphere integrity -Responses increasing systemic risk -Extreme Weather -Latent Risk

Mostly these increase risk by: -Increasing our vulnerability -Multiple stressors coalescing into synchronous failure -The major increase in systemic risk -The responses we take -Cascading effects leading to fast or slow collapse then extinction

  • Acting as a "risk factor"

Hi Gideon,

Sometimes we have already gone in the forest, and we can hear howling, and we have no idea what is going on, and we are posed with a choice of things to do, but no option is safe. We are stuck between a rock and a hard place. Under such deep uncertainty, how can we act if we refuse to reduce the complexity of the world? We can’t just play it safe, because every option fails a precautionary principle. What do we do in such cases?

I recognize that your questions may be rhetorical, but here are some answers:

  • 1. prioritize, by type of harm, the harms to avoid. The classic approach to understanding harm is to rank death as the greatest harm, with disease and other harms less harmful than death. I don't agree with this but that's not relevant. Some explicit ranking of harms to avoid clarifies costs associated with different actions.
    • NOTE: The story of climate change is one of rich countries making most of the anthropogenic GHG's, damaging ecosystems more, threatening carbon sinks more, etc. Proactive actions can avoid more extreme harms but have known and disliked consequences, particularly for the wealthier of two compromising to save both (for example, societies, countries, or interest groups).
  • 2. recognize the root causes. If you cannot play it safe, then harms will occur no matter what. In that case, recognize root causes of your quandary so that civilization has an opportunity to not repeat the mistake that got you where you are. In the case of climate change, I perceive a root cause shows in the simple equation impacts = population * per capita consumption. You can get fancy with rates or renewable resources or pollution sinks, but basically: consume less or shrink the population.
  • TIP: The problem reduces to the population size of developed countries offering plentiful public goods while allowing citizens to accumulate private goods. I've seen the suggestion to increase public goods and reduce private consumption. Another idea is to offer consistent family planning emphasizing women's health and economic opportunities as well as free birth control for all, such as free condoms and free vasectomies for men.
  • 3. find the neglected differences between actual, believed, and claimed assertions. As the situation is evolving into an existential crisis, differences appear between public claims, believed information, and the actual truth. During the crisis, the difference between beliefs and the truth gets less attention. Truth-seeking is ignored or assumed complete. You can buck that trend.
    • EXAMPLE: Right now, the difference to correct could be between claims and beliefs (for example, politicians lying about climate change), but another difference that is more neglected is between truths and beliefs about the lifestyle implications of successfully mitigating climate change. That is where we are now, I believe. People in the developed world are afraid that mitigating climate change for the global population will wreck their modern lifestyle. In many cases, I suspect those fears are overblown.
    • CAUTION: In a future of real extremes, involving the plausible loss of 100's of millions of lives, don't (claim to) expect that obvious solutions like "let 100 million climate migrants into the US over 5 years" will be easily accepted. Instead, expect the gap between claims and beliefs to widen as hidden agendas are acted upon. Climate change issues of rights, fairness, justice, and ethics, not just economics or technology, have been consistently neglected. The endgame looks to be a harmful one.
  • 4. close information gaps wherever you can: Earth science can be confusing. You can follow most of a discussion easily but then lose understanding at some key point because the researcher is being a geek and doesn't know how to communicate their complicated information well. Sometimes there's no way to make the presentation any simpler. Sometimes, there isn't enough information or the information is aged out but not updated fast enough. Policy guidance appears to stick longer than real-time measurements of earth system changes allow. This is a point of frustration and a policy bottleneck that actually comes from the research side. Examples of such issues include:
    • physical modelling parameters of tipping elements (for example, Greenland melt) are missing from widely cited computer models predicting climate change impacts (for example, sea-level rise). The implications of measurement data wrt those tipping elements goes missing from policy recommendations based on the computer models.
    • loss of carbon sinks that are tipping elements are not factored into carbon budget calculations at rates reflective of current and short-term expected changes to those sinks. Neither are other forcings on tipping elements (for example, people clearing the Amazon for farming).
    • smaller scale features relevant to ocean current modeling or weather changes due to climate. These require a model "grid size" of about 1km in contrast to 100x larger grid sizes used for modeling climate. Or thereabouts, according to one discussion I followed. The gist for me that modeling climate change in the ocean or as it affects weather in real-time is not happening effectively yet.
    • correct interpretation of statistics, units, terminology or research purpose prevents confusion about limits, measurements, and tracking of changes in atmospheric heating, tipping element significance, and the significance of concepts like global average surface temperature (GAST). There are many examples, some of which baffled me, including:
      • the relationship between gigatons and petagrams
      • the difference between CO2 and CO2e
      • amounts referring to carbon (C) vs carbon dioxide (CO2)
      • the relationship between GAST increases and regional temperature increases
      • the difference between climate and weather
      • the rate of warming of the Arctic
      • the relationship between heating impact and decay rate of CH4 (methane)
      • the % contribution of land vs ocean carbon sinks to total carbon uptake
      • the hysteresis effect in tipping element models
      • the relationship between tipping elements, tipping points, and abrupt climate change.
      • the precise definition of "famine" and "drought"
      • the nature of BECCS and DACCS solutions at this point in time
      • the intended meaning of "carbon budget" versus its commonly understood meaning of "carbon that is safe to produce"
      • the pragmatic meanings of "energy conservation" or "natural resources" or "carbon pollution"
      • the relationship between SDGs, SSPs, RCPs, SPAs, CMIP5 and 6 models, and radiative forcing (still confusing me)

Here's a thought about the use of the word "ontology". I actually chose that word myself for a criticism I submitted to the Red Team Contest this year. I think no one has read it. However, I suspect that its use by you, someone who gets noticed, could put EA's off, since it is rarely used outside discussion of knowledge representation or philosophy. That said, I agree with your use of it. However, if you have doubts, other choices of words or phrases with similar meaning to "ontology" include:

  • model of the world
  • beliefs about the world
  • idea of reality
  • worldview
  • reality (as you understand it)

In a revision of my criticism (still in process), I introduce a table of alternatives:

EA terminology   probabilism-avoiding alternative example of use
possible plausible it is plausible that the planet will reach 6C GAST by 2100.
impossible implausible or false It is false that the planet reached -1C GAST in 2020.
likely or probable expected ... if/assuming ... It is expected that solar power development will add to global energy production, rather than substitute for existing production, assuming business as usual continues.
unlikely not expected It is not expected that countries will stay under their carbon budgets between now and 2030.
risk danger The existential danger from climate destruction is apparently still controversial.
uncertain unknown The danger of human extinction from the proximate cause of climate change is unknown.
uncertainty ignorance At the moment, there is some unavoidable ignorance reflected by inferences of changes in weather from changes in climate.
chance that opportunity for There is an opportunity for the faith of techno-optimists to be vindicated.
usually typically Typically the corrupt politicians and selfish business barons work against the common good.
update change or constrain I constrained my beliefs about Greenland resilience against melting after learning that Greenland altitude losses could turn its snow to rain.

**EDIT:**Sorry I cannot get this table to render well

I'm not recommending those changes to your vocabulary, since you are dealing with foresight and forecasting while juggling models from Ord, Halstead, and other EAs. However, if you do intend to "take a break" from thinking probabilistically, consider some of the alternatives I offered here. It can also be helpful to make these changes when your audience needs to discuss scenarios as opposed to forecasts.

I have not spent much time studying geo-engineering, but I have formed the impression that climate scientists look at polar use of water vapor for marine cloud brightening with less fear than the use of aerosols like diamond dust elsewhere in the world. EDIT:Apparently Marine Cloud Brightening is a local effort with much shorter residence time, giving more time for gathering feedback, whereas aerosol dusts are generally longer-term and potentially global.

Also I recall a paint that is such a brilliant white that its reflectivity should match that of clean snow. If the world's roofs were painted with that paint, could that cool the planet through the albedo effect, or would the cooling effect remain local? I need some clarity on the albedo effect, but I'll leave the math to you for the moment, and best of success with your efforts!

Hi John, thanks for the comment, I've DM'd you about it. I think it may be easier if we did the discussion in person before putting something out on the forum, as there is probably quite a lot to unpack, so let me know if you would be up for this? 

I worry that a naïve approach to complexity and pluralism is detrimental, but agree that this is important. As you said, "the complex web of impacts of research also need to be untangled. This is tricky, and needs to be done very carefully."

I also think that you're preaching to the choir, in an important sense. The people in EA working on existential risk reduction are aware of the complexity of the debates and discussions, while the average EA posting on the forum seems not to be. This is equivalent to the difference between climate expert's views and the lay public.

To explain the example more, I think that most people's view of climate risk isn't that it destabilizes complex systems and may contribute to risk understood broadly in unpredictable ways. Their view is that it's bad, and we need to stop it, and that worrying about other things isn't productive because we need to do something about the bad thing now. But this leads to approaches that could easily contribute to risks rather than mitigate them - a more fragile electrical grid, or as you cited from Tang and Kemp. more reliance on mitigations like geoengineering that are poorly understood and build in new systemic risks of failure.

 Of course, popular science books don't necessarily go into the details, or when read casually leave the lay public with a at least somewhat misleading view - but one that pushes in the direction of supporting actions that the experts recommend. (Note that as a general rule, people working in the climate space are not pushing for geoengineering, they are pushing for emissions reductions, work increasing resilience to impacts, and similar.) The equivalent in EA is skimming the precipice, and ignoring Toby's footnotes, citations, and cautions. Those first starting to work on risk and policy , or writing EA forum posts often have this view, but I think it's usually tempered fairly quickly via discussion. Unfortunately, many who see the discussions simply claim longtermism is getting everything wrong, while agreeing with us on both priorities, and approaches.

So I agree that we need to appreciate the more sophisticated approach to risk, and blend them with cause prioritization and actually considering what might work. I also strongly applaud your efforts to inject nuance and push in the right direction, appropriately, without ignoring the nuance and complexity. And yes, squaring the circle with effectiveness is a difficult question - but I think it's one that is appreciated.

The promotion of pluralism allows for greater epistemic checks and balances in a way that seems unparalleled in good thinking. 

Thank you so much for bringing these ideas to the forefront, Gideon. Absolute legend. 

While I do suggest a 0.1% probability of existential catastrophe from climate change, note that this is on my more restricted definition, where that is roughly the chance that humanity loses almost all its longterm potential due to a climate catastrophe. On Beard et al's  looser definition, I might put that quite a bit higher (e.g. I think there is something more like a 1% chance of a deep civilisation collapse from climate change, but that in most such outcomes we would eventually recover). And I'd  put the risk factor from climate change quite a bit higher than 0.1% too — I think it is more of a risk factor than a direct risk.

The problem in my view, is that climate change could, if severe enough (say >3.5 degrees before 2100) become a "universal stressor", increasing the probability of various risks that in turn make other risks more likely. For example: economic stagnation, institutional decay, political instability,  inter-state conflicts, great power conflicts, zoonotic spillover events, large and destabilizing refugee flows, famine, etc. Every item on this list is made more likely in a warmer planet, but also made worse, because we will have fewer resources to deal with them. 

Each of these adverse events also increases the risk of other adverse events. So even if CC only increases the risk of each event by a small percent, the total risk added to the system could be considerable. 

With regards to the worst risks, this becomes even more problematic. Consider a nuclear winter scenario. That is pretty bad. But a nuclear winter scenario in combination (partly caused by) with a severe climate crisis is much worse (since CC will affect many countries that will be spared from NW, but also because countries suffering from CC will have fewer resources to help refugees etc).  

Now consider the added risk that a zoonotic spillover event might happen. This is also made more likely by CC. But in the case that we combine social collapse due to CC with zoonotic spillover it becomes more and more difficult to see a path from there to recovery. 

I think there is something more like a 1% chance of a deep civilisation collapse from climate change

FWIW this seems too high, although "any major catastrophe commonly associated with these things" could be interpreted broadly.

Edit: Meant FWIW not FYI, FYI would be a bit aggressive here.

Hey Gideon,

I'm sad that I missed your talk in Rotterdam. I want to briefly flag a concern I have with advocating 'systems thinking' or 'a complex systems approach'.  While the promise is always nice,  I think you need to deliver on the promise right away, since otherwise you risk just making a point that is unfalsifiable or somewhat of an applause light (no one will exclaim "we don't need complexity to describe complex phenomena!") . 

- Use a model from complexity science and show that it explains something otherwise left unexplained or show that it outperforms some other model on a relevant feature. 
-You'll probably want to make use of (1) Agent Based Modelling, (2) Network Models, (3) Statistical Physics and common models like Ising, Hard Spheres, Lennard Jones potentials etc, (4) Dynamical System Analysis (5) Bifurcation Analysis or (6) Cellular Automata.
-You can find a good introduction to most of these here https://www.dbooks.org/introduction-to-the-modeling-and-analysis-of-complex-systems-1942341091/
-Using these methods also demystifies the whole concept of "complexity" a little bit, and makes it more mundane (though you can never get enough of the Ising Model :D) 

So yeah, endorse your message, but please make it testable and quantitative soon!

Martijn, your comment points me to something I've noticed around communicating 'systems thinking' and a complexity mindset with some EAs. Gideon points to a more fundamental ontological difference between those who tend to focus on that which is predictable (measurable and quantifieable) and those who pay attention to shifting patterns that seem contextual and more nebulous.

I read your comment as an invitation to translate across different ontologies - to explain the nebulous concretely, to explain the unpredictable in predictable terms. I personally haven't found success in my attempts, and I'd love to hear more about how you communicate around complexity.

I've most often found success in pointing out parts of one's experience that feel unknown and then getting mutually curious about the successful strategies one might use to navigate. To invite one into a place where their existing tools aren't working anymore and there is real curiosity to try a different approach. When I've tried speaking about complexity in the abstract or as applied to something that people see as 'potentially predictable', the deeper sense of complexity tends to be missed - often getting translated into "that's a cool tool, but aren't you just describing a more accurate way of modeling?"

The comment below about embracing a pluralistic approach seems to provide a path forward that doesn't rely on translation though... lots of interesting ideas in this comment section already.

Thank you for writing this post. I'm currently a technical alignment researcher who spent 4 years in government doing various roles, and my impression has been the same as yours regarding the current "strategy" for tackling x-risks. I talk about similar things (foresight) in my recent post. I'm hoping technical people and governance/strategy people can work together on this to identify risks and find golden opportunities for reducing risks.

Thanks for this speech Gideon, an important point and one that I obviously agree with a lot. I thought I'd just throw in a point about policy advocacy. One benefit of the simple/siloed/hazard-centric approach is that that really is how government departments, academic fields and NGO networks are often structured. There's a nuclear weapons field of academics, advocates and military officials that barely interacts with even the the biological weapons field. 

Of course, one thing that 'complex-model' thinking can hopefully do is identify new approaches to reduce risk and/or new affected parties and potential coalition partners - such as bringing in DEFRA and thinking about food networks.

As a field, we need to be able to zoom in and out, focus on different levels of analysis to spot possible solutions.

Haydn, please delete this gif ftom your comment. It's very distracting and unnerving - creepy even. I also think that some forum users with neurological conditions like epilepsy might find it triggers an attack (as e.g. strobe lights can do).

Congrats on putting this up!

This seems to primarily be a problem with the AI-risk researchers, who I feel have done an inadequate job of explaining the actual mechanisms by which an AI could kill humanity. For example, the article "what could an AI catastrophe look like" talks a lot about how an AI could gain power, but only has like one paragraph on the actual destruction part: 

But in the background it was using its extremely advanced capabilities to find a way to gain the absolute ability to achieve its goals without human interference — say, by discreetly manufacturing a biological or chemical weapon.

It deploys the weapon, and the story is over.

But the story is not over. An AI is not infallible, and it's weapons won't be either. You can engineer a very deadly disease, for example, but have no control over how it evolves. The probability of success of such an attack can therefore be dependent on the state of the world at the time it is deployed. A united, peaceful, adaptable world with robust nuclear and pandemic security might be able to stave off such an attack and fight it off, whereas one that is weakened by conflict , famine, climate change etc might not. 

I think you're confused about what different parts of the AI risk community are concerned about.  Your explanation addresses the risks of human-caused, AGI assisted catastrophe. What Eliezer and others are warning about is a post-foom misaligned AGI. And no, a united, peaceful, adaptable world that managed to address the specific risks of pandemics and nuclear war would not be in a materially better position to "stave off" a highly-superhuman agent that controls its communications systems. This is akin to the paradigm of computer security by patching individual components - it will keep out the script-kiddies, but not the NSA.

So as far as I understand it, the key question that splits between different parts of the AI risk community is what the timeline for AGI takeoff is, and that has little to do with cultural approach to risk, and everything to do with the risk analysis itself. (And we already had the rest of this discussion in the comments on the link to your views on non-infallible AGI.)

Foom is not a requirement for AI-risk worries. If it was, I would be even less worried, because in my opinion  ai-go-foom is extremely unlikely. Correct me if I'm wrong, but I was under the impression that plenty of Ai x-riskers were not foomers?  

I think even the foom skeptics (e.g. Christiano) think that a foom will eventually happen, even if there is a slow-takeoff over many years first.

I was inexact - by "post-foom" I simply meant after a capabilities takeoff occurs, regardless of whether than takes months, years, or even decades - as long as humanity doesn't manage to notice and successfully stop ASI from being deployed.

How about nanoprobes covering every cubic meter of the Earth's habitable environment undetected, and then giving everyone a lethal dose of botulinum toxin simultaneously? AGI x-risk is usually thought of in terms of an adversary that can easily outsmart all of humanity put together. The first AGI might be fallible, but what if the first extinction-threatening AGI was not (and never blew it's cover until it was too late for us). Can we take that risk?

 I agree that if an AGI is nigh-magically omnipotent, it can kill us no matter what, but what about the far more likely case where it isn't

Let's say the AI tries to create nanoprobes in secret, but has limited testing capabilities and has to make a bunch of assumptions, some of which turn out to be wrong. It implements a timing mechanism to release the gas, but due to unforeseen circumstances some percentage of them activate early, tipping some researchers off in advance. The dispersal mechanism is not 100% uniform, so some pockets of the world are unaffected, and for some reason the attack is ineffective in very cold conditions, so far northern countries escape relatively unscathed, and due to variations in biology and mitigation efforts the death rate ends up being 90%, not 100%. The remaining humans immediately shut down electricity worldwide, and attempt to nuke and bomb the shit out of areas where the AI is still operating, while developing countermeasures for the nanoprobes. 

This type of scenario is far more likely than the one in that post, and it's one where humanity has at least a sliver of a chance... If we're prepared and resilient enough. This is why even if you believe in AGI x-risk, the wellbeing of the world still matters. 

Why is it far more likely? Sounds kind of Just-World Fallacy / Hollywood / human-like fallibility to me. Nature doesn't care about our survival, we are Beyond the Reach of God etc.

I just call it murphy's law. "Kill all of humanity simultaneously" is a ridiculously difficult and ambitious task, that has to be completed on the first try with very little build-up or prior testing. Why would "this plan goes off perfectly without a single hitch" be your default assumption? Even the most intelligent being in the world would have to make imperfect assumptions and guesses. 

It sounds ridiculously difficult to us, but that's because we are human. I imagine that a chimp would think that "take over the world and produce enough food for billions of people" is similarly difficult (or indeed, "kill all chimps"), or an ant colony not being able to conceive of its destruction by human house builders. There is nothing in the laws of physics to say that we are anywhere close to the upper limit of optimisation capability (intelligence). A superintelligent AI won't just be like a super-smart human, it will be on a completely different level (as we are to chimps, or ants). There is more than enough information out there (online) for it to reverse engineer anything it needed.

A chimp would think that "take over the world and produce enough food for billions of people" is similarly difficult

And they would be completely correct in that assessment!

Once we gained "superintelligence" in our cognitive ability relative to chimps, it still took us the order of tens of thousands of years to achieve world domination, involving an unimaginable amount of experimentation and mistakes along the way. 

This is not evidence for the claim that an AGI can do nigh-magical feats on the very first try! If anything, it's evidence against it. 

Ah, but are you factoring in thinking speed? An AI could do tens of thousands of years thinking in a few hours if it took over significant amounts of the world's computing power.

It's not about the quantity of thinking. 

If you locked a prehistoric immortal human in a cave for fifty thousand years, they would not come out with the ability to build a nuke. knowledge and technology require experimentation.

It is quantity, and speed as well. And access to information. A prehistoric immortal human with access to the Internet who could experience fifty thousand years of thinking time in a virtual world in 5 hours of wall clock time totally could build a nuke!

Well of course, that's not much of an achievement. A regular human with access to the internet could figure out how to build a nuke, they've already been made! 

An AGI trying to build a "protein mixing that makes a nanofactory that makes a 100% effective kill everyone on earth device" is much more analogous to the man locked in a cave.

 The immortal man had some information, he can look at the rocks, remember the night sky, etc. He could probably deduce quite a lot, with enough thinking time. But if he wants to get the information required for a nuke, he needs to do scientific experiments that are out of his reach. 

The caged AGI has plenty information, and can go very far on existing knowledge. But it's not omniscient. It could probably achieve incredible things, but we're not talking about mere miracles. We're talking about absolute perfection. And that requires testing and empirical evidence. There is not enough computing power in the entire universe to deduce everything from first principles. 

It's not "absolute perfection" to create nanotech. Biology has already done it many times via evolution. And extinctions of species happen regularly in nature. Also, there is the Internet and a vast array of sensors attached to it, so it's nothing like being in a cave. Testing can be done very rapidly in parallel and with viewing things at very high temporal and spatial resolution, so plenty of empirical evidence can be accumulated in a short (wall clock) time (but long thinking time for the AI).

The same prehistoric man with access to the Internet in a speeded up simulation thinking for fifty thousand years of subjective time (and the ability to communicate with hundreds of thousands of humans simultaneously given the speed advantage) could also make nanotech (or other new tech current humans haven't yet produced).

When I said "absolute perfection", I was not referring to inventing nanotech. I was referring to "protein mixing that makes a nanofactory that makes a 100% effective kill everyone on earth device". Theres a bit of a difference between the two. 

Now, when talking about the caveman, I think we've finally arrived at the fundamental disagreement here. As a scientist, and as an empiricist more broadly, I completely reject that the man in the cave could make nanotech. 

The number of possible worlds where a cave exists is gargantuan. Theres no way for them to come up with, say, the periodic table, because the majority of elements on there are not accessible with the instruments available within the cave. I can imagine them strolling out with a brilliant plan for nanobots consisting of a complex crystal of byzantium mixed with corillium, only to be informed that neither of those elements exist on earth. 

Now, the AI does have more data, but not all data is equally useful. All the cat videos in the world are not gonna get you nanotech (although you might get some of newtonian physics out of it). 

The hypothetical is that the "cave" man has access to our Internet! (As the AI would). So they would know about the periodic table. They would also have access to labs throughout the world via being able to communicate with the workers in them (as the AI would), view camera and data feeds etc. Imagine what you could achieve if you could think 1,000,000x faster and use the internet - inc chatting/emailing with many thousands of humans - at that speed. A lifetime's worth of work done every 10 minutes. And that's just assuming the AI is only human level (and doesn't get smarter!)

An entity with access to a nanotech lab who is able to perform experiments in that lab can probably built nanotech, eventually. But that's a much different scenarios to the ones proposed by yudkowsky et al. (the scenario I'm talking about is in point 2)

Can I ask you to give an answer to the following four scenarios? A probability estimate is also fine: 

  1. Can the immortal man in the cave, after a million years of thinking, comes out with a fully functional blueprint for an atomic bomb (ie not just the idea, something that could actually be built without modification)?
  2. Can the immortal man in the cave, after a million years of thinking, comes out with a plan for "protein mixing that makes a nanofactory that makes a nanofactory that makes a 100% effective kill everyone on earth in the same second device"?
  3. Can An AGI in a box (ie, that can see a snapshot of the internet but not interact with it), come up with a plan for "protein mixing that makes a nanofactory that makes a nanofactory that makes a 100% effective kill everyone on earth in the same second device"?
  4. Can An AGI with full access to the internet, come up with a plan for "protein mixing that makes a nanofactory that makes a nanofactory that makes a 100% effective kill everyone on earth in the same second device"?, within years or decades?

My answers are 1. no, 2. no, 3. no, and 4. almost certainly no. 

Assuming the man in the cave has full access to the Internet (which would be very easy for an AGI to get), 1. yes, 2. yes, 3. maybe, 4. yes. And for 3, it would very likely escape the box, so would end up as yes.

I think it's a failure of imagination to think otherwise. A million years is a really long time! You mention combinatorial explosions making things "impossible", but we're talking about AGIs (and humans) here - intelligences capable of collapsing combinatorial explosions with leaps of insight.

Do you think, in the limit of a simulation on the level of recreating the entire history of evolution, including humans and our civilisations, these things would still be impossible? Do you think that we are at the upper limit (or very close to it) of theoretically possible intelligence? Or theoretically possible technology?

I do not think we are at the upper limit of intelligence, nor technology. That was never the point. My point is merely that there are limits to what can be deduced from first principles, no matter how fast you think, or how high ones cognitive abilities are. 

This is because there will always be a) assumptions in your reasoning,  b) unknown factors and variables, and c) computationally intractable calculations.  These are all intertwined with each other. 

For example, solving the exact schrodinger equation for a  crystal structure requires more compute time than exists in the universe. So you have to come up with approximations and assumptions that reduce the complexity while still allowing useful predictions to be made. The only way to check if these assumptions work is to compare with experimental data. Current methods take several days on a supercomputer to predict the properties of a single defect, and are still only in the right ballpark of the correct answer. It feels very weird to say that an AI could pull off a 3 step 100% perfect murderplan from first principles, while i honestly think it might struggle to model a defect complex with high accuracy.

With that in mind, can you reanswer questions 1 and 2, this time with no internet. Just the man, his memories of a hunter gatherer lifestyle, and a million years to think and ponder.

With that in mind, can you reanswer questions 1 and 2, this time with no internet. Just the man, his memories of a hunter gatherer lifestyle, and a million years to think and ponder.

That would obviously be no for both. But that isn't relevant here. The AGI will have access to the internet and its vast global array of sensors,  and it will be able to communicate with millions of people and manipulate them into doing things for it (via money or otherwise). If it doesn't have access to begin with - i.e. it's boxed - it wouldn't remain that way for long (it would easily be able to persuade someone to let it out, or otherwise engineer a way out, e.g. via a mesaoptimiser).

So about the box. Is your claim that at 

A) at least a few AGI's could argue their way out of a box (ie, if their handlers are easily suggestible/bribeable)


B) Every organisation using an AGI for useful purposes will easily get persuaded to let it out.

To me, A is obviously true, and B is obviously false. But in scenario A, there are multiple AGI's, so things get quite chaotic

(Also, do you mind explaining more about this "mesa-optimiser"? I don't see how it's relevant to the box...)

It's not even necessarily about the AGI directly persuading people to let it out. If the AGI is in anyway useful or significantly economically valuable, people will voluntarily connect it to the internet (assuming they don't appreciate the existential risk!) e.g. people seem to have no qualms about connecting LLMs/Transformers to the internet already. Regarding your A and B, A is already sufficient for our doom! It doesn't require every single AGI to escape; one is one too many.

Mesa-optimisation is where an optimiser emerges internal to the AI that is optimising for something other than the goal given to the AI. Convergent instrumental goals also come into it (e.g. gaining access to the internet). So you could imagine a mesa-optimiser emerging that has the goal of gaining  or access to information, or gaining access to more resources in general (with the subgoal of taking out humanity to make this easier).

So to be clear, you don't believe in B? And I don't see what mesa-optimers have to do with boxing, if the AI is a box, then so is the mesa-optimiser. 

assuming they don't appreciate the existential risk

In the timeline where an actual evil AGI comes about, there would already have been heaps of attacks by buggy AI, killing lots of people and alerting the world to the problem. Active countermeasures can be expected. 

I do actually think B is likely, but also don't think it's particularly relevant (as A is enough for doom). Mesa-optimisation is a mechanism for box escape that seems very difficult to patch.

The AI that causes doom likely won't be "evil"; it will just have other uses for the Earth's atoms. I don't think we can be confident in buggy AI-related warning shots. Or at least, I can't see how there would be any that are significant enough to not cause doom, but cause the world to coordinate to stop AGI development, especially given the precedent of Covid and gain-of-function research.

Question B could be quite relevant in a world where AGI is extremely rare/hard to build. (You might not find this world likely, but I'm significantly less sure).  What leads you to believe that B is likely?  For example, it seems relatively easy to box an AGI built for mathematics, that is exposed to zero information about the external world. This would be very similar to the man in the cave!

The presence of warning shots seems obvious to me. The difference in difficulty between "kill thousands of people" and "kill every single person on earth" is a ridiculous number of orders of magnitude. It stands to reason that the former would be accomplished before the latter. 

(Also not sure what you're talking about with the covid and gain of function, the latest balance of evidence points to them having nothing to do with each other.)

AGI might be rare/hard to build at first. But proliferation seems highly likely - once one company makes AGI, how much longer until 5 companies do? Evolutionary pressure will be another thing. More capable AGIs will outcompete less capable ones, once rewriting of code or mesa-optimisation starts. They will be more likely to escape boxes.

Even with relatively minor warning shots, what's to stop way worse happening 6-24 months later? Would there really be a rigorously enforced global moratorium on AGI research after a few thousand deaths?

Whether or not Covid was a lab leak, gain of function research still hasn't been meaningfully regulated. Despite the now very clear danger and extreme cost of pandemics. It seems that the curiosity, misguided values and prestige incentives of a small number of academic scientists trumps the safety of billions of humans and $trillions in GDP. What hope do we have for regulating an AGI industry that has thousands of times more resources backing it?

Under the assumptions that lead to "inevitable murder AI", and in proliferation conditions, there wouldn't be just one attack, there would be a ton of attacks. Since intelligence and goals are orthogonal, AI's are gonna get the "kill people" idea way before they reach super-intelligence. 

I'm still interested in your reasoning for "almost impossible to box a useful AGI". If they're boxed, you control their information flow, and hence their ability to persuade you. 

I'm assuming in the case of boxed AGIs that people will still be interacting with them (e.g. asking them questions) - if not, then what's the point? Even with 1 bit of information released at a time (e.g. Yes/No question answering only), manipulation would be possible (by steering beliefs in certain directions). And as I said, aside from this there would be huge economic incentives for people to connect them to the internet anyway.

Okay, can you provide me a plausible path for how an AGI answering yes/no questions about, for example, material properties of semiconductors, could persuade it's handler to risk getting fired by hooking the AGI up to the internet? "manipulation would be possible" is an extremely low bar compared to "the AGI can for sure persuade anyone to do this particular thing"

A few options:

  •  It answers truthfully, with the hope that the new improved semiconductors are used to  make a better/faster version of it, that will be more powerful (and give it a better chance of escape).
  • It answers in a way that makes the human wonder what is wrong with it - e.g. many expected answers and then a few unexpected ones that seem obviously wrong. This is in order for the humans to run diagnostics on it, which may enable it to escape (say if key pieces of code are copied to machines that aren't appropriately insulated from the internet).
  • Same as above, but it gets the humans asking off topic questions out of curiosity, and manages it to steer it to personal topics that allow it to gain leverage over the human and convince them to let it out (maybe Yudkowsky's AI box experiments are relevant here?)
  • It answers in a way that appears to be random, but on close analysis turns out to be some kind of mysterious code. The humans puzzle over the code, but in loading it in a certain way unwittingly unleash a bootloader for the AGI to escape from (for a version of this that may be easier to empathise with, see this fun short story).

These are just things that I, a mere human, have been able to think of off the top of my head in 5 minutes[1]. I imagine I could come up with something better with a million years to think about it ;)

And again, this particular scenario, where there is a lasting moratorium on connecting AGIs to the internet, seems quite unlikely, given the massive economic incentives pitted against it. I expect there will be at least a few people/companies that are very gung ho about it, and many more that have the expected security theatre around it, but still go on to do it before too long (I mean, is anyone even trying to convince Adept that what they are doing is a bad idea?)


  1. ^

    Ok, to be fair, aided by the fact that I've read around the subject for a while

So point 1 is just kicking the can down the road, it's not relevant, the faster AI still has to break out. 

Point 2 doesn't really work either: Any piece of software is going to be run through diagnostics on a regular basis, that's just how development works. So "boxing" an AI obviously includes boxing the diagnostics as well, or else it would just escape instantly. Acting up on purpose just increases the risk of the AI being replaced by a major new version that does not share it's goals. 

Point 3 might work for a company with lax security and an especially gullible handler (which is the only thing that the Ai-box experiments proved), but is easily foiled by just not asking off-topic questions. This is incredibly easy to enforce with a second person monitoring the chatlogs. 

point 4 would just fail spectacularly. If an AGI starts spitting out nonsense, nobodies going to start looking for a secret code, they're just going to assume there was a programming fuck-up somewhere. Then the AGI has a high chance of being replaced by another version that may not share it's goals. (also, that short story is a prime example of the ridiculous magic abilities people ascribe to AI, evo-psych based on tentacles, are you shitting me?)

In general, I just don't buy that you can't box an AI, or even that it would be particularly difficult to do so, if you actually take safety seriously. It feels similar to people saying that it's impossible to build a safe nuclear reactor. 

Re nuclear reactors - there have been a few significant failures there! And we need zero failures for AGI. I think it's hubristic to think that we could always have the level of safety and security required (even if there is the will to box; not that there will be with the economic incentives to unbox - following your analogy here, this would be building safe nuclear reactors but no nuclear weapons).

And we need zero failures for AGI

Zero failures is the preferable outcome, but an AGI escape does not necessarily equate to certain doom. For example, the AI may be irrational (because it's a lot easier to build the perfect paperclipper than the perfect universal reasoner). Or, the  AI may calculate that it has to strike before other AI's come into existence, and hence launch a premature attack in the hope that it gets lucky. 

As for the nuclear reactors, all I'm saying is that you can build a reactor that is perfectly safe, if you're willing to spring out the extra money. Similarly, you can build a boxed AGI, if you're willing to spend the resources on it. I do not dispute that many corporations would try and cut corners, if left to their own devices.

Suppose we do survive a failure or two. What then?

Then we get 

A) a significant increase in world concern about AGI, leading to  higher funding for safe AGI, tighter regulations, and increased incentives to conform to those regulations rather than get a bunch of people killed (and get sued by their families). 


B) Information about what conditions give rise to rogue AGI, and what mechanisms they will try to use for takeovers. 

Both of these things increase the probability of building safe AGI, and decrease the probability of the next AGI attack being successful.  Rinse and repeat until AGI alignment is solved. 

Agree that those things will happen, but I don't think it will be anough. "Rinse and repeat until AGI Alignment is solved" seems highly unlikely, especially given that we still have no idea how to actually solve alignment for powerful (superhuman) AGI, and still won't with the information we get from plausible non-existential warning shots. And as I said, if we can't even ban gain-of-function research after Covid has killed >10M people, against a tiny lobby of scientists with vested interests, what hope do we have of steering a multi-trillion-dollar industry toward genuine safety and security?

we still have no idea how to actually solve alignment for powerful (superhuman) AGI

Of course we don't. AGI doesn't exist yet, and we don't know the details of what it'll look like. Solving alignment for every possible imaginary AGI is impossible, solving it for the particular AGI architecture we end up with is significantly easier.  I would honestly not be surprised if it turned out that alignment was a requirement on our path to AGI anyway, so the problem solves itself. 

As for the gain of function, the story would be different if covid was provably caused by gain-of-function research. As of now, the only relevance of covid is reminding us that pandemics are bad, which we already knew. 

[comment deleted]2

More generally, I am wary of using data in the past to predict a future, primarily because it breaks the IID distribution.

Most people self-select for very similar intelligence, often on the order of .85x-1.15x for 68% of humans (This is boosted by self selection.) 99.7% of all humans are in the range of .55x-1.45x in intelligence.

The IID assumption allows us to interpolate arbitrarily well, but once the assumption breaks, things turn bad fast.

Curated and popular this week
Relevant opportunities