Hide table of contents

If you're interested in working on similar questions, consider applying to work with us. We're currently looking for full-time researchers as well as summer research fellows. The application deadline is on March 15. You can find all the details here.


In this article, I sketch arguments for the following claims:

  • Transformative AI scenarios involving multiple systems pose a unique existential risk: catastrophic bargaining failure between multiple AI systems (or joint AI-human systems).
  • This risk is not sufficiently addressed by successfully aligning those systems, and we cannot safely delegate its solution to the AI systems themselves.
  • Developers are better positioned than more far-sighted successor agents to coordinate in a way that solves this problem, but a solution also does not seem guaranteed.
  • Developers intent on solving this problem can choose between developing separate but compatible systems that do not engage in costly conflict or building a single joint system.
  • While the second option seems preferable from an altruistic perspective, there appear to be at least weak reasons that favor the first one from the perspective of the developers.
  • Several avenues for (governance) interventions present themselves: increasing awareness of the problem among developers, facilitating the reaching of agreements (perhaps those for building a joint system in particular), and making development go well in the absence of problem awareness.




No comments on this post yet.
Be the first to respond.
Curated and popular this week
Relevant opportunities