(Note: I'm so far very in favor of work on AI safety. This question isn't intended to oppose work on AI safety, but to better understand it and its implications.)
(Edit: The point of this question is also to brainstorm some possible harms of AI safety and see if any of these can produce practical considerations to keep in mind for the development of AI safety.)
Is there any content that investigates the harms that could come from AI safety? I've so far only found the scattered comments listed below. All types of harm are relevant, but I think I most had in mind harm that could come from AI safety work going as intended as opposed to the opposite (an example of the opposite: it being misrepresented, de-legitimized as a result, and it then being neglected in a way that causes harm). In a sense, the latter seems much less surprising because the final mechanism of harm is still what proponents of AI safety are concerned about (chiefly, unaligned AI). Here, I'm a bit more interested in "surprising" ways the work could cause harm.
- "AI safety work advancing AGI more than it aligns it"
- "influencing [major] international regulatory organisation in a way leading to creating some sort of "AI safety certification" in a situation where we don’t have the basic research yet, creating false sense of security/fake sense of understanding" and "influencing important players in AI or AI safety in a harmful leveraged way, e.g. by bad strategic advice "
[slightly lazy response] You may be interested in some of the sources linked to from the following pages:
Some of the sources list ways AI safety work specifically could be harmful, whereas other list more general types of / pathways to accidental harm which also happen to be relevant to AI safety work.
(Overall, I think a lot of AI safety work is very valuable, and people shouldn't let somewhat generic worries about accidental harm strongly push them away from doing AI safety work, but that it's also good to be aware of some accidental harm pathways, get feedback from sensible people before making big moves, etc. Obviously that sentence is fairly vague! But the above links can provide more details.)