How can organizations like Anthropic which develops tools to test and understand machine learning systems' safety make sure that real-world deployment of AI systems are safe? Why would top AI firms devote time and money to work on safety, and how can we make sure that's done right?
[The reason that this might be important to do now, on current ML models, is to build credentials and expertise to be able to influence more important models later on]
One option is to build free tools and share research broadly, so that it'd be easier for any organization to test its own safety. Another could be to work on building a culture among AI engineers that promotes safety concerns. Another yet could be to work on better governance and legislation. There are probably many more.
Thought #1: Have a "Safe AI" approval stamp by a strong AI safety team. We can imagine companies paying for this stamp on their product, so that the public would feel more secure in using that product.
I don't really think this is promising, as I don't really think the public cares much about safety and wouldn't understand it anyway. Even if it would work, I think a likely failure mode would be a competing "approval stamp company" that'd have much lower standards and wouldn't scale up to transformative AI.
Thought #2: Have this kind of approval mandated by law.
I'm unsure about the feasibility of this, but I'm pretty skeptical - especially as we are talking about a global arrangement. I'm very skeptical of the ability of governments to have a strong and adequate team to do this.
Thought #3: Same as #1, except that the safety organization also takes on itself the risk of getting sued, and in fact operates as an insurance company.
I'm unsure of the legal details. It seems like it could be an interesting for-profit idea, which could start with autonomous vehicles and fairness/discrimination issues and grow from there.
However, having this kind of structure wouldn't be aligned with preventing GCRs, as there wouldn't be anyone to sue for such extreme events (and even if there would be, the harm would be much greater than can be compensated for).