Linkposting, tagging and excerpting in accord with 'Should pretty much all content that's EA-relevant and/or created by EAs be (link)posted to the Forum?'.
I consider this to be one of the best posts on the AI Alignment Forum. I find it surprising that it wasn't cross-posted to the EA Forum along with the preceding two posts in the sequence ('AI strategy nearcasting' and 'How might we align transformative AI if it’s developed very soon?'), especially given its non-technical nature: this post focuses on a core issue in AI strategy, the “deployment problem”.
When thinking about how to make the best of the most important century, two “problems” loom large in my mind:
- The AI alignment problem: how to build AI systems that perform as intended, and avoid a world run by misaligned AI.
- The AI deployment problem (briefly discussed here): the question of how and when to (attempt to) build and deploy powerful AI systems, under conditions of uncertainty about how safe they will be and how close others are to deploying powerful AI of their own.
A previous piece discussed the alignment problem; this one discusses the deployment problem.
Summary of the post (bearing in mind that within the nearcast, I’m using present tense and not heavily flagging uncertainty):
- I’ll break this scenario up into three stylized “phases,” even though in practice I think the boundaries between them could be fuzzy.
- “Phase 1” refers to the period of time when there aren’t yet dramatic new (safe) capabilities available to the world via highly powerful (e.g., transformative) AI systems. In this phase, Magma [, a major AI company,] believes itself to be close to developing transformative AI systems, but has not yet done so - and/or has not yet deployed such AI systems because it can’t be confident enough that they’re aligned. A major goal is “try to get some AI system(s) to be both highly powerful (to the point where they could qualify as transformative) and reliably aligned.”
- “Phase 2” refers to the period of time after Magma has succeeded in getting some AI system to be both highly powerful (e.g., transformative) and reliably aligned - but there is still a major threat of other, less cautious actors around the world possibly deploying powerful misaligned AI. In this phase, Magma and IAIA [- an organization, which could range from a private nonprofit to a treaty-backed international agency, that tracks transformative AI projects and takes actions to censure or shut down dangerous ones -] focus on reducing that risk, hopefully with help from powerful technologies that didn’t exist in Phase 1.
- “Phase 3” comes in once Magma and IAIA have succeeded at this, so there is a very low risk, globally, of anyone deploying misaligned AI systems. Now the main risks come from things like human misuse of powerful AI systems that behave as their human users intend.
- In “Phase 1” - before both-transformative-and-aligned AI systems - major priorities should include the following:
- Magma should be centrally focused on increasing the odds that its systems are aligned, discussed in a previous post. It should also be prioritizing internal security (both to prevent its AI systems from using security exploits and to prevent exfiltration of critical information, especially its AI systems' weights); exploring deals with other companies to reduce “racing” pressure (among other benefits); and producing “public goods” that can help actors worldwide reduce their level of risk (e.g., evidence about whether misaligned AI is a real risk and about what alignment methods are/aren’t working).
- IAIA can be working on monitoring AI companies (with permission and help in the case where IAIA is a nonprofit, with legal backing in the case where it is e.g. a regulatory body); ensuring that companies developing potentially transformative AI systems have good security practices, good information security and good information sharing practices; and helping disseminate the sorts of “public goods” noted above. [...]
- Both Magma and IAIA should be operating with the principle of selective information sharing in mind - e.g., sharing some information with cautious actors but not with incautious ones. [...]
- In “Phase 2” - as aligned-and-transformative AI systems become available - major priorities should continue to include the above, as well as a number of additional tactics for reducing risks from other actors (briefly noted in a previous piece):
- Magma and IAIA should be deploying aligned AI systems, in partnership with governments and via commercial and nonprofit means, that can contribute to defense/deterrence/hardening. [...]
- Magma should be developing ever-better (which includes being cheaper and easier) approaches to aligning AI systems, as well as generating other insights about how to handle the situation as a whole, which can be offered to IAIA and other actors throughout the world. [...]
- Magma should be continuing to improve its AI systems’ capabilities, so that Magma’s aligned systems continue to be more capable than others’ (potentially less safe) systems. It should then be helping IAIA to take full advantage of any ways in which these highly capable systems might be helpful (e.g., for tracking and/or countering dangerous projects).
- It may end up turning out that it looks like, absent government involvement, other actors will deploy powerful unsafe systems whose harm can’t be stopped/contained even with the help of the best safe systems. In this case, IAIA - with help from Magma and other AI companies - should take more drastic actions (and/or recommend that governments take these actions), such as:
- Clamping down on AI development (generally, or in particular dangerous settings).
- To the extent feasible and needed, credibly threatening to employ (or if necessary employing) powerful technologies that could help enforce regulatory agreements, e.g. via resource accumulation, detecting violations of the regulatory framework, military applications, etc.
- At some point (“Phase 3”), the risk of a world run by misaligned AI hopefully falls to very low levels. At this point, it’s likely that many actors are using advanced, aligned AI systems.
- From there, the general focus becomes working toward a world in which humans are broadly more capable and more inclined to prioritize the good of all beings across the world and across time.
- I’ll briefly run through some implications that seem to follow if my above picture is accepted, largely to highlight the ways in which (despite being vague in many respects) my picture is implying nontrivial things. Future pieces will go into more detail about implications for today’s world.