My impression (not well researched though) is that such prizes have served in the past to inspire many people to try to solve the problem, and they bring a ton of publicity to both the problem itself and to why the problem is difficult.
I'm not sure if the money would need to be held somewhere in the meantime, but if not then this seems like an extremely easy offer - if some person / group solves it then great they get the money and it's really well spent. If not, then the money gets spent on something else. If the money would need to be reserved and can't be spent in the meantime then this becomes a much more nuanced cost-benefit analysis, but I still think it might be worth considering.
Has this idea been discussed already? What are the counterarguments?
The main challenge seems to be formulating the goal in a sufficiently specific way. We don’t currently have a benchmark that would serve as a clear indicator of solving the alignment problem. Right now, any proposed solution ends up being debated by many people who often disagree on the solution’s merits.
FTX Future Fund listed AI Alignment Prizes on their ideas page and would be interested in funding them. Given that, it seems like coming up with clear targets for AI safety research would be very impactful.
My colleagues have often been way too nice about reading group papers, rather than the opposite. (I’ll bet this varies a ton lab-to-lab.)