I think that adversarial collaborations are a good way of understanding competing perspectives on an idea, especially if it is polarising or especially controversial.
The term was first introduced by Daniel Kahneman. The basic idea is that two people with competing perspectives on an issue work together towards a joint belief. Two people working in good faith would be able to devise various experiments and discussions that clarify the idea and work towards a joint belief. (Kahneman uses the word "truth", but I think the word "belief" is more justified in this context).
AI x-risk is a good place to have a public adversarial collaboration
First the issue is especially polarising. The beliefs of people working on AI risk are that AI presents one of the greatest challenges to humanity's survival. On the other hand, AI research organisations by revealed preference (they're going full speed ahead on building AI capabilities) and stated preference (see this survey too) think the risk is much lower.
In my opinion having an adversarial collaboration between a top AI safety person (who works on x-risk from AI) and someone who did not think that the x risks were substantial would have clear benefits.
- It would make the lines of disagreement clearer. To me, an outsider in the space it's not very clear where exactly people disagree and to what extent. This would clear that up and possibly provide a baseline for future debate to be based on.
- It would also legitimise x-risk concerns quite a bit if this was to be co-written by someone respected in the field.
- Finally, it would make both sides of the debate evaluate the other side clearly and see their own blindspots better. This would improve the overall epistemic quality of the AI x-risk debate.
How could this go wrong?
- The main failure mode is that the parties writing it aren't doing it in good faith. If they're trying to write it out with the purpose of proving the other side wrong, it will fail terribly.
- The second failure mode is that the arguments for either sides are based too much on thought experiments and it is hard to find a resolution because there isn't much empirical grounding for either side. In Kahenman's example, even with actual experiments they could infer from, both parties couldn't agree with it for 8 years. That's entirely possible with this as well.
Other key considerations
- Finding the right people from both sides of the debate might be more difficult than I assume. I think there are people who can do it (eg. Richard Ngo and Jacob Buckman have said that they have done it in private) and Boaz Bark and Ben Edelman have also published a thoughtful critique (although not an adversarial collaboration), but it maybe that they're too busy or aren't interested enough in doing it
- A similar version has been done before and this might risk duplicating it. I don't think this is the case because the debate was hard to follow and not explicitly written with the indent of finding a joint belief.
I think adversarial collaborations are very interesting, so I am curious to hear if anyone has done any work on how we can make this technique scale a bit more? Such as writing a good manual for how to do this?
A starting point may be these two posts on an adversarial collaboration contest from 2019: https://slatestarcodex.com/2019/12/09/2019-adversarial-collaboration-entries/ and https://slatestarcodex.com/2020/01/13/2019-adversarial-collaboration-winners/.
There aren't too many insights relating directly to scaling, however. Important takeaways seem to be (a) it's a lot of work to coordinate, (b) lots of teams dropped out and (c) providing a template and perhaps some formatting instructions may be useful.