My impression (not well researched though) is that such prizes have served in the past to inspire many people to try to solve the problem, and they bring a ton of publicity to both the problem itself and to why the problem is difficult.
I'm not sure if the money would need to be held somewhere in the meantime, but if not then this seems like an extremely easy offer - if some person / group solves it then great they get the money and it's really well spent. If not, then the money gets spent on something else. If the money would need to be reserved and can't be spent in the meantime then this becomes a much more nuanced cost-benefit analysis, but I still think it might be worth considering.
Has this idea been discussed already? What are the counterarguments?
I like the TruthfulQA idea/paper a lot, but I think incentivizing people to optimize against it probably wouldn't be very robust, and non-alignment-relevant ideas could wind up making a big difference.
Just one of several issues: The authors selected questions adversarially against GPT-3—i.e., they oversampled the exact questions GPT-3 got wrong—so, simply replacing GPT-3 with something equally misaligned but different, like Gopher, should yield significantly better performance. That's really not something you want to see in an alignment benchmark.