I ran the following experiment on Petrov’s day on the EA Hangouts Facebook page:

Hey everyone, it's Petrov day and I wanted to run a little scenario.

I'm going to donate $100 each to AMF and MIRI. Unless someone comments with "nuke MIRI" or "nuke AMF". In that case, I'll give people 20 minutes to respond by nuking the other org, and close the option to respond after that.

If neither orgs get nuked by midnight tonight, I'll give $100 to both orgs. If both orgs get nuked, they'll both get nothing. If only one org gets nuked, the nuked org will get nothing and the other org will get $300.

I'd prefer if no one nuked anyone, but I promise to honor the outcome of the experiment and donate accordingly. These are real donations that I wouldn't have made otherwise.

At 23:23, MIRI was nuked with no retaliation to AMF, and I donated $300 to AMF. Before this happened, there were a lot of interesting game dynamics and I learned more than I expected to!

I created this experiment because I always felt like Petrov day exercises didn’t involve much real incentive to launch an attack. The Cold War was not like this. There were real fears in both the USSR and the US that the other power represented an existential threat. After talking it through with some friends, I decided to go with a Prisoner’s Dilemma type game and see what happened.

Less than half an hour after I posted the original message, someone asked why anyone would ever counternuke. I replied that they might counternuke because they had precommited to do so, in order to deter someone from nuking their preferred organization. Immediately after this, they commented “nuke AMF”. Another commenter asked if it was possible to cancel an attack. I hadn’t thought of this possibility, so I decided that yes they could if they deleted or edited their comment within 20 minutes of posting it.

This led to an interesting dynamic. A few minutes later someone else commented “nuke MIRI” with a followup message that they would delete this comment if the original attacker deleted their “nuke AMF” comment. Both posts were then deleted by their authors. 

This happened two more times, where someone made an initial launch post and someone else launched a counter attack, with a promise to abort the attack if the original attack was canceled. One of the attacks got very close to detonation, with ~10 seconds to spare.

Someone else posted a $100 bounty to AMF if someone nuked MIRI without retaliation, and someone added a $50 consolatory precommited donation to MIRI for the same scenario. Another person launched an attack against both orgs to try to extort more donations from group members, and aborted these launches with no successful extortions.

My favorite suggestion was one by Oliver Habryka and Avi Norowitz to pre-commit to nuking one organization at random, in a publicly verifiable way such as using the NIST public randomness beacon. This way, the EV would be $150 to both organizations, better than the $100 to both achieved by default, with the added benefit of no defection necessary. I was excited to see this suggested and endorsed by a number of people, but no one ended up trying it.

The final post by Alexandre Zani read:

I'm going to sleep. If tomorrow morning nobody has counter-nuked without cancelling, I will make a $50 donation (with an extra $50 matched by my employer for a total of $100) to MIRI.

Nuke MIRI.

Good night everyone!

No one responded, and I’m not sure if anyone saw it in time. I thought this final attack was  clever for waiting til the clock had almost run out, implying a precommitment in the form of bedtime, and providing a disincentive for retaliation. That being said, I’m disappointed that a cooperative solution was not reached.

Specifying that people were allowed to recall their attacks definitely changed the game and made the negotiations more interesting. Part of me was tempted to make the launch aborts probabilistic. In fact someone simulated this by launching an attack and pre-committing to abort their launch with an 11/12 chance. 

It seems worth nothing that while allowing recalls made the game more interesting to me, it did not seem particularly historically accurate. In the Cold War, ICBM attacks could not be recalled. This led to a more narrow margin for error, since decision makers like those who Petrov reported to had to decide whether to “use them or lose them” in the event of a probable attack, and data obtained after they were launched could not be used to avert a nuclear war. Modern long range missiles could certainly be programmed to be destroyed prior to their targets by command and control systems, but likely this capability would not be advertised if implemented, so it’s not clear if it’s been added.

Perhaps unsurprisingly in a group with 1.6k members, people were quite willing to take action and launch nukes. I counted 10 total launches with 9 of these aborted. There were 114 comments in total. A final takeaway for me was that while many people argued for cooperation, I saw more action taken towards clever ways to win money for one org or the other. Bounties were posted for outcomes where one side got nuked, but no bounties were posted for cases where neither side defected. Unilateral action was easier to take than trying to build a consensus such as the one Oli and Avi proposed, both offensively by launching first strikes and defensively by launching second strikes unless the first were recalled. If cooperation in these scenarios is desirable, then I think that active cooperation strategies are needed when aggressive action is rewarded by default.

Thank you Miranda Dixon-Luinenburg / lesswrong for editing help!





More posts like this

Sorted by Click to highlight new comments since:

This is certainly an interesting experiment to read about! Question though: Does it technically count as a true prisoner’s dilemma if the aggregate wellbeing increases with unilateral defection? I would have thought it wouldn’t be a prisoner’s dilemma unless the unilateral defection reward were something between $100 and $200, rather than $300…? If I really wanted $100 to be donated to one of the orgs but also liked the other one (albeit not to the same extent), I might have just said something like “I will commit to nuking one org, then I will verifiably commit to donating >$100 to the org that gets nuked.”

I expect most people to think either that AMF or MIRI is much more likely to do good. So from most agent's perspectives, the unilateral defection is only better if their chosen org wins. If someone has more of a portfolio approach that weights longtermist and global poverty  efforts similarly, then your point holds. I expect that's a minority position though.

According to wikipedia, the $300  vs $100 is fine for a one-shot prisoner's dilemma. But an iterated prisoner's dilemma would require (defect against cooperate)+(cooperate against defect) < 2*(cooperate cooperate), since the best outcome is supposed to be permanent cooperate/cooperate rather than alternating cooperation/defection.

However, the fact that this games gives out the same 0$ for both cooperate/defect and defect/defect means it nevertheless doesn't count as an ordinary prisoner's dilemma. Defecting against someone who defects needs to be strictly better than cooperating against a defector. In fact, in this case, every EA is likely going to put some positive valuation on $300 to both miri and amf, so cooperating against a defector is actively preferred to defecting against a defector.

Curated and popular this week
Relevant opportunities