this policy brief provides policymakers with concrete recommendations for how governments can manage AI risks.
Policy recommendations:
1. Mandate robust third-party auditing and certification.
2. Regulate access to computational power.
3. Establish capable AI agencies at the national level.
4. Establish liability for AI-caused harms.
5. Introduce measures to prevent and track AI model leaks.
6. Expand technical AI safety research funding.
7. Develop standards for identifying and managing AI-generated content and recommendations.
I agree that behavioral science might be important to creating a non-brittle alignment, and I am very, very, very, very, very bullish about behavioral science being critically valuable for all sorts of factors related to AI alignment support, AI macrostrategy, and AI governance (including but not limited to neurofeedback). In fact, I think that behavioral science is currently the core crux deciding AI alignment outcomes, and that it will be the main factor determining whether or not enough quant people end up going into alignment. In fact, I think that the behavioral scientists will be remembered as the superstars, and the quant people are interchangeable.
However, the overwhelming impression I get with the current ML paradigm is that we're largely stuck with black box neural networks, and that these are extremely difficult and dangerous to align at all; they have a systematic tendency to generate insurmountably complex spaghetti code that is unaligned by default. I'm not an expert here, I specialized in a handful of critical elements in AI macrostrategy, but the things I've seen so far indicates that the neural network "spaghetti code" is much harder to work with than the human alignment elements. I'm not strongly attached to this view though.