In this post, we (Nicholas Emery-Xu, Andrew Park, and Robert Trager) summarize the results of our paper modeling the role of information in risky technology races. The current version can be accessed here.
1 Is revealing technological capability beneficial?
Imagine in the not too distant future that actors are developing a new technology that gives the winner a discontinuous advantage, economic or geopolitical, over their rivals. Because of this advantage, actors are tempted to cut corners on safety to focus on capabilities research, producing a safety-performance tradeoff. In such a scenario, is it better if actors knew each other’s capability levels? On the one hand, public knowledge of capabilities prevents actors from mistakenly believing they are behind in capability and then entering a dangerous arms race. On the other hand, if actors learn they are close in capability, they may engage in a dangerous race to the bottom, cutting corners on safety to gain a small edge. In many cases, however, we do not observe actors engaging in such a race. Top AI labs, for example, regularly publish results on capabilities but continue to invest significantly in safety research, defying the predictions in Armstrong et al. (2016). What’s going on? In our paper, we find that when the technological development path is even moderately uncertain, even actors who know they are close in capability are unwilling to engage in a dangerous race to the bottom because the expected returns to doing so aren’t worth the risk of an accident.
We build on the model in Armstrong et al. (2016) (see “AI race” for a useful description) to study a technology race with a safety-performance tradeoff in which capabilities investments are unknown, privately known, or publicly known. However, we argue that their results depend on two unrealistic assumptions: that the actor with higher capability investment wins the race for sure and that only the winner’s actions contribute to the risk of an accident. In our work, we relax these assumptions.
First, we argue that actors are unlikely to be certain about how additional units of capability investment translate into success in the race. In a race for a novel technology, there is likely to be randomness in the development process. For example, while Enrico Fermi was the first to use graphite as a neutron moderator, the lagging Soviets had actually already made such a discovery but failed to apply it to their nuclear program. We control for this level of randomness with a decisiveness parameter. If the parameter is zero, each actor is equally likely to win the race, regardless of effort. As it increases, the more capable actor is increasingly likely to win the race. Thus, for more decisive races, each additional unit of investment in capability is more likely to bring success.
What are the effects? At low levels of decisiveness, the randomness outweighs the benefits of taking risks, so the race is perfectly safe under all information states. As decisiveness increases, risk almost always increases. In addition, as decisiveness increases, at first the private information case is most dangerous. Only at high levels of decisiveness does the public information case become most dangerous. In other words, there is no information hazard unless the race is highly decisive. For less decisive races, it is better to reveal capabilities than keep them private.
3 Safety provision
The other assumption that we challenge is that only the winner of the race contributes to safety. Instead, we add a safety-sharing parameter to allow both actors to contribute a weighted sum of overall safety. We might imagine, for example, that a lagging actor could conduct tests, either in preparation for the mature technology, or at a lower level of technological competence in response to the leader. We find that increasing the proportion of risk contributed by the losing actor produces two effects. First, a moral hazard effect increases risks. Just as in the provision of carbon emissions, the less actors bear the benefit of their own safety provision, the more they will shirk. Second, less capable actors increase their safety provision because their choices matter even if they lose the race. We find that the moral hazard effect is stronger, so reducing the winner’s share of safety provision increases expected disaster risk.
As a result of this work, we present three tentative conclusions. First, when deciding whether to incentivize actors to reveal capabilities, it is important to know their uncertainty over technological progress. At least at the early, uncertain stages of development, public knowledge of capabilities seems to be beneficial. Second, if losers of the race cause a disaster, policymakers should shift their focus away from low-capability to high-capability actors, providing them with incentives to reduce shirking, for example by sharing safety knowledge with the leader. Third, we caution against updating too strongly on any specific model (including our own!) and instead urge readers to reason through or empirically test its modeling assumptions in the context of real-world races.
Robert Trager, Paolo Bova, Nicholas Emery-Xu, Eoghan Stafford, and Allan Dafoe, "Welfare Implications of Safety-Performance Tradeoffs in AI Safety Research," Working paper, August 2022.
Stuart Armstrong, Nick Bostrom, and Carl Shulman, "Racing to the precipice: a model of artificial intelligence development," AI & Society, 31(2):201–206, May 2016, http://link.springer.com/10.1007/s00146-015-0590-y.
Toby Ord, "Lessons from the development of the atomic bomb," Working paper.