Polarisation hampers cooperation and progress towards understanding whether future AI poses an existential risk to humanity and how to reduce the risks of catastrophic outcomes. It is exceptionally challenging to pin down what these risks are and what decisions are best. We believe that a model-based approach offers many advantages for improving our understanding of risks from AI, estimating the value of mitigation policies, and fostering communication between people on different sides of AI risk arguments. We also believe that a large percentage of practitioners in the AI safety and alignment communities have appropriate skill sets to successfully use model-based approaches.
In this article, we will lead you through an example application of a model-based approach for the risk of an existential catastrophe from unaligned AI: a probabilistic model based on Carlsmith’s Is Power-seeking AI an Existential Risk? You will interact with our model, explore your own assumptions, and (we hope) develop your own ideas for how this type of approach might be relevant in your own work. You can find a link to the model here.