AD

Allan Dafoe

319 karmaJoined Sep 2020

Bio

Director, Centre for the Governance of AI at the Future of Humanity Institute

Comments
1

Yes, I was compressing a lot of ideas in a small space!

We have conceptual slippage here. By safety I'm referring to the safety of the developer (and maybe the world), not of a consumer who decides which product to buy. As you note, if safety is a critical component of winning a competition, then safety will be incentivized by the competition.

It is generally the case that it is hard to separate out (self) safety from competitive performance; we might best conceptualize "safety" as the residual of safety (the neglected part) after conditioning on performance (ie we focus on the parts of safety that are not an input to performance).

Increased competitive pressure will: (1) induce some actors to drop out of the competition, since they can't keep up; (2) those who stay in will in equilibrium invest more in winning the competition, investing less in other goods. This happens when there is some "tradeoff" or fungibility across anything of value, and winning the competition. You can think of "competitive pressure" as the slope of the function of the probability that one wins the competition, given investment in winning (and holding the other's investment fixed). If the contest is a lottery, then the slope is flat. If one is so far ahead or behind that effort doesn't make a difference, the slope is flat. If we are neck in neck, and investment maps into performance, then the marginal incentive to invest can get high.

You are right that in the presence of a safety externality this race to the bottom can get much worse. I think of Russell's setting "the remaining unconstrained variables to extreme values". In the Cold War, nuclear brinkmanship yielded advantage, and so world leaders gambled with nuclear annihilation.

But this dynamic can still happen even if the risk is only to oneself. An individual can rationally choose to accept a greater risk of harm, in exchange for keeping up in a now more intense contest. The "war of attrition" game (or equivalently "all pay auction") exemplifies this, where the player only pays what they bid, and yet in symmetric equilibria, actors do burn resources competing for the prize (more so the more valuable is the prize).

The externality in a race is that anything I do to increase my chance of winning, reduces your chance of winning, and so "investment in winning" will be oversupplied relative to the socially optimal level.

Overall, thinking clearly through the strategic dynamics of models like these can be very complex, and one can often generate a counter-intuitive result. Robert Trager, formerly myself, and others have a lot of work exploring the various strategic wrinkles. I'm not sure what is the best link, here are some places to start that I'm more familiar with: https://forum.effectivealtruism.org/posts/c73nsggC2GQE5wBjq/announcing-the-spt-model-web-app-for-ai-governance

Eoghan Stafford, Robert Trager and Allan Dafoe, Safety Not Guaranteed: International Strategic Dynamics of Risky Technology Races, Working paper, July 2022. https://drive.google.com/file/d/1zYLALn3u8AhuXA4zheWmrD7VHmI6zTvD/view?usp=sharing