> if aligning AI with human values has historically resulted in catastrophic outcomes, how can we ensure that AI alignment will not amplify the very harms we aim to prevent?
We have been putting our money, time, and trust behind even more on our current human values being the way out even as empirically we see how our actions have been power-seeking, we have been responsible for the largest extinction event killing all our cousin species and we are doubling down on technology, more extraction of value from nature.
It is an S risk to imagine AI systems instilled with these "human" values
I think this is a very important point,
> if aligning AI with human values has historically resulted in catastrophic outcomes, how can we ensure that AI alignment will not amplify the very harms we aim to prevent?
We have been putting our money, time, and trust behind even more on our current human values being the way out even as empirically we see how our actions have been power-seeking, we have been responsible for the largest extinction event killing all our cousin species and we are doubling down on technology, more extraction of value from nature.
It is an S risk to imagine AI systems instilled with these "human" values