I'd like AI researchers to establish a consensus (>=70%) opinion on this question: What properties would a hypothetical AI system need to demonstrate for you to agree that we should completely halt AI development?
There are at least two ways that an agreement among researchers on this seems useful:
1. Labs could then attempt to build non-lethal non-AGI 'toy' systems which demonstrate the agreed-upon properties. If a sufficiently powerful AI system will be lethal to humans, this work could potentially prove it before such a system is built.
2. If it turns out there is a need to completely shut down AI development, the only way this will happen is with government help. We would likely need a consensus opinion that we are in the process of building a system lethal to humans if we want to compel governments to shut it all down.
Building this kind of a consensus takes time and I think we should start trying now!
I wrote in a previous post that I would like to prepare governments for responding quickly and decisively to AGI warning shots. A consensus on this question would both clarify what those warning shots might look like, and give us a better chance of noticing them before it's too late if alignment is failing catastrophically.
I agree that getting a consensus about "in what situation does everyone agree to STOP" is a good idea.
I also think this has downsides, such as "everyone will continue uninterrupted until then" and "maybe the criteria won't be well defined enough or something" or "that's just one final safety to rely on". Still, with the disclaimer that I didn't think about this enough to decide on it as a world policy, I agree
I definitely think "that's just one final safety to rely on" applies to this suggestion. I hope we do a lot more than this!
Excellent suggestion. I think the main benefit of asking your question of leading AI researchers ('What properties would a hypothetical AI system need to demonstrate for you to agree that we should completely halt AI development?') would be that many of them would say 'There are no AI properties that would make me advocate for halting AI development'. (For example, I can't imagine Yann LeCun or any hard-core AI accelerationists arguing for a halt, under almost any conditions, given their recent rhetoric on Twitter.)
It would be valuable for ordinary citizens to see such responses, because it would clarify for them that, for many AI advocates, the AI itself is the goal, and any impacts on humanity are considered trivial, tangential, or transient. In other words, the AI accelerationists would reveal themselves as ideologues who view humanity as a disposable bridge to superintelligence, and ordinary folks would be horrified, and galvanized to advocate for stronger pauses and/or halts earlier.