Hi, I'm Max :)
Thanks for writing this up, I think it's a really useful benchmark for tracking AI capabilities.
One minor feedback point, I feel like instead of reporting on statistical significance in the summary, I'd report on effect sizes, or maybe even better just put the discrimination plots in the summary as they give a very concrete and striking sense of the difference in performance. Statistical significance is affected by how many datapoints you have, which makes lack of a difference especially hard to interpret in terms of how real-world significant the difference is.
Thanks for sharing, I really appreciate your committment, and that you announce it.
Fwiw, my immediate reaction is that this type of protest might be a little too soon and it will cause more ridicule and backlash because the general public's and newsmedia's impression is that there is currently no immediate danger. Would be interested in learning more about the timing considerations. Like, I'd imagine that doing this barricading in the aftermath of some concrete harm happening would make favorable reporting for newsmedia much more likely, and then you could steer the discourse towards future and greater harms.
Thanks so much for all your contributions Lizka! :) I really appreciated your presence on the forum, like a friendly, alive, and thoughtful soul that was attending to and helping grow this part of our ecosystem.
Thanks for doing this work, this seems like a particularly useful benchmark to track the world model of AI systems.
I found it pretty interesting to read the prompts you use, which are quite extensive and give a lot of useful structure to the reasoning. I was surprised to see in table 16 that the zero-shot prompts had almost the same performance level. The prompting kinda introduces a bunch of variance I imagine, and I wonder whether I should expect scaffolding (like https://futuresearch.ai/ are presumable focussing on) to cause significant improvements.
Thanks, that all makes sense and moderates my optimism a bit, and it feels like we roughly exhausted the depth of my thinking. Sigh... anyways, I'm really thankful and maybe also optimistic for the work that dedicated and strategically thinking people like you have been and will be doing for animals.
(Just quick random thoughts.)
The more that Trump is perceived as a liability for the party, the more likely they would go along with an impeachment after a scandal.