Author: Leonard Dung
Abstract: Many researchers and intellectuals warn about extreme risks from artificial intelligence. However, these warnings typically came without systematic arguments in support. This paper provides an argument that AI will lead to the permanent disempowerment of humanity, e.g. human extinction, by 2100. It rests on four substantive premises which it motivates and defends: first, the speed of advances in AI capability, as well as the capability level current systems have already reached, suggest that it is practically possible to build AI systems capable of disempowering humanity by 2100. Second, due to incentives and coordination problems, if it is possible to build such AI, it will be built. Third, since it appears to be a hard technical problem to build AI which is aligned with the goals of its designers, and many actors might build powerful AI, misaligned powerful AI will be built. Fourth, because disempowering humanity is useful for a large range of misaligned goals, such AI will try to disempower humanity. If AI is capable of disempowering humanity and tries to disempower humanity by 2100, then humanity will be disempowered by 2100. This conclusion has immense moral and prudential significance.
My thoughts: I read through it rather quickly so take what I say with a grain of salt. That said, it seemed persuasive and well-written. Additionally, the way that they split up the argument was quite nice. I'm very happy to see an attempt to make this argument more philosophically rigorous and I hope to see more work in this vein.
I've looked into the game theory of war literature a bit, and my impression is that economists are still pretty confused about war. As you mention, the simplest model predicts that rational agents should prefer negotiated settlements to war, and it seems unsettled what actually causes wars among humans. (People have proposed more complex models incorporating more elements of reality, but AFAIK there isn't a consensus as to which model gives the best explanation of why wars occur.) I think it makes sense to be aware of this literature and its ideas, but there's not a strong argument for deferring to it over one's own ideas or intuitions.
My own thinking is that war between AIs and humans could happen in many ways. One simple (easy to understand) way is that agents will generally refuse a settlement worse than what they think they could obtain on their own (by going to war), so human irrationality could cause a war when e.g. the AI faction thinks it will win with 99% probability, and humans think they could win with 50% probability, so each side demand more of the lightcone (or resources in general) than the other side is willing to grant.
To take this one step further, I would say that given that many deviations from the simplest game theoretic model do predict war, war among consequentialist agents may well be the default in some sense. Also, given that humans often do (or did) go to war with each other, our shared values (i.e. the extent to which we do have empathy/altruism for others) must contribute to the current relative peace in some way.