The strongest claim in this post is almost a throwaway line: "AI intent alignment alone won't help if the human principle is fanatical or malevolent, and AI aligned with Stalin probably won't usher in utopia." This deserves to be the center of analysis, not a supporting observation.
If alignment as obedience doesn't solve the problem when a principle is fanatical, then what we actually need isn't alignment; it's some structural constraint on decision making that prevents fanatics from wielding AI-amplified power without accountability.
It's the systematic destruction of accountability where figures like Stalin, Hitler, and Mao dismantled the structures that allowed affected people to push back on decisions. The people lost the ability to say, "This is wrong and it needs to stop."
The post-intervention section gets at this partially. "Strengthening and safeguarding liberal democracies" is the right direction, but the priority should be identifying and preventing specific ideologies as a content-level intervention. Who decides which ideologies count as fanatical?
The more robust approach is structural. Ensuring decision makers remain accountable to those affected by their decisions. Building systems where consequences flow back to the people making choices. Constraining power concentration at the structural level rather than trying to filter beliefs.
The strongest claim in this post is almost a throwaway line: "AI intent alignment alone won't help if the human principle is fanatical or malevolent, and AI aligned with Stalin probably won't usher in utopia." This deserves to be the center of analysis, not a supporting observation.
If alignment as obedience doesn't solve the problem when a principle is fanatical, then what we actually need isn't alignment; it's some structural constraint on decision making that prevents fanatics from wielding AI-amplified power without accountability.
It's the systematic destruction of accountability where figures like Stalin, Hitler, and Mao dismantled the structures that allowed affected people to push back on decisions. The people lost the ability to say, "This is wrong and it needs to stop."
The post-intervention section gets at this partially. "Strengthening and safeguarding liberal democracies" is the right direction, but the priority should be identifying and preventing specific ideologies as a content-level intervention. Who decides which ideologies count as fanatical?
The more robust approach is structural. Ensuring decision makers remain accountable to those affected by their decisions. Building systems where consequences flow back to the people making choices. Constraining power concentration at the structural level rather than trying to filter beliefs.