Not sure I fully understand your point about "double-edged signaling games" - could you please clarify?
I think "more attention to AI ethics in a general way" is good: the line between ethical and safety concerns is blurry and quite artificial, and framing AI safety into the broader "responsible AI" discourse can mobilize more talents and support.
Good question. Benchmarks provide empirical, quantitative evaluation. They can be static datasets, e.g. ImageNet. They can also be models! For example, CLIP is a model capable of image captioning and is used to evaluate image generation models like DALLE2, specifically how aligned the generated images are to text inputs.
The bottom line is, benchmarks should provide a way for AI labs and researchers to compare with each other in a fair way, representing the research progress towards goals that the research community cares about.
Hope this helps!