I think this discussion was had on LW a few years ago (and probably sporadically since then). Just quickly some parameters off the top of my head.
Pro:
- Improves Forecasting
- Necessary infrastructure for a variety of verification tech that will be needed for international treaties
- Know when to sound alarm bells
- Helps us know what type of defensive technologies we need to build
Cons:
- Increases the speed of ai development
- Unclear if US and China are even interested in coordinating
- Unclear if a number from a eval will be enough to cause significant political pressure
Unsure
- Depending on trajectory of benchmarking, builds/kills hype and reduces/increases investment.

A relevant question I'm not sure about: for people who talk to politicians about AI risk, how useful are benchmarks? I'm not involved in those conversations so I can't really say. My guess is that politicians are more interested in obvious capabilities (e.g. Claude can write good code now) than they are in benchmark performance.