Hide table of contents

Tl;dr: don’t be fooled into thinking that some groups working on AI are taking “safety” concerns seriously (enough).[1]


Note: I’m posting this in my personal capacity. All views expressed here are my own. I am also not (at all) an expert on the topic. 

Two non-AI examples


Companies “greenwash” when they mislead people into incorrectly thinking that their products or practices are climate and environment-friendly (or that the company focuses on climate-friendly work).

Investopedia explains:

  • Greenwashing is an attempt to capitalize on the growing demand for environmentally sound products.
  • The term originated in the 1960s, when the hotel industry devised one of the most blatant examples of greenwashing. They placed notices in hotel rooms asking guests to reuse their towels to save the environment. The hotels enjoyed the benefit of lower laundry costs.
    • Wikipedia: “[Jay Westerveld, the originator of the term] concluded that often the real objective was increased profit, and labeled this and other profitable-but-ineffective ‘environmentally-conscientious’ acts as greenwashing.” (Wikipedia also provides a long list of examples of the practice.) 

I enjoy some of the parody/art (responding to things like this) that comes out of noticing the hypocrisy of the practice.


A similar phenomenon is the “humanewashing” of animal products. There’s Vox article that explains this phenomenon (as it happens in the US): 

A carton of “all natural” eggs might bear an illustration of a rustic farm; packages of chicken meat are touted as “humanely raised."

In a few cases, these sunny depictions are accurate. But far too often they mask the industrial conditions under which these animals were raised and slaughtered.

Animal welfare and consumer protection advocates have a name for such misleading labeling: “humanewashing.” And research suggests it’s having precisely the effect that meat producers intend it to. A recent national survey by C.O.nxt, a food marketing firm, found that animal welfare and “natural” claims on meat, dairy, and egg packaging increased the intent to purchase for over half of consumers.


...rather than engaging in the costly endeavor of actually changing their farming practices, far too many major meat producers are attempting to assuage consumer concerns by merely changing their packaging and advertising with claims of sustainable farms and humane treatment. These efforts mislead consumers, and undermine the small sliver of farmers who have put in the hard work to actually improve animal treatment.

If you want a resource on what food labels actually mean, here are some: onetwothree (these are most useful in the US). (If you know of a better one, please let me know. I’d especially love a resource that lists the estimated relative value of things like “free-range” vs. “cage-free,” etc., according to cited and reasonable sources.)

Definition of safety-washing

In brief, “safety-washing” is misleading people into thinking that some products or practices are “safe” or that safety is a big priority for a given company, when this is not the case.

An increasing number of people believe that developing powerful AI systems is very dangerous,[2] so companies might want to show that they are being “safe” in their work on AI.[3]

Being safe with AI is hard and potentially costly,[4] so if you’re a company working on AI capabilities, you might want to overstate the extent to which you focus on “safety.” 

So you might: 

  • Pick a safety paradigm that is convenient for you, and focus on that
  • Talk about “safety” when you really mean other kinds of things the public might want an AI to be, like un-biased and not-hateful
  • Start or grow a safety team, feature it in media about your work (or conversations with safety-oriented people), but not give it a lot of power 
  • Promote the idea that AI safety concerns are crazy
  • And more

Some of these things might be better than doing nothing for safety concerns, but overall, (safety-)washing causes some problems (discussed in the next section), which in turn worsens the situation with risk from AI. 

Related: Perhaps It Is A Bad Thing That The World's Leading AI Companies Cannot Control Their AIs (Astral Codex Ten) 

What are the harms?

I don’t have the time to write a careful report on the matter, but here are some issues that I think arise from greenwashing, humane-washing, and safety-washing:

  1. Confusion: People working on the issue (and the general public) get confused about what really matters — terms lose their meanings, groups lose focus, etc.
  2. Accidental harm: People are misled about what companies are doing, which in turn leads to people doing directly harmful things they didn’t intend to do
    • E.g. This encourages people to work for harmful companies/projects or to support them financially because they’re not aware of the harm the companies cause.
  3. False security: Causes a false sense of safety/goodness/progress (which can lead to insufficient mitigation of the harm caused, a lack of other kinds of preparation, and other problems)
  4. Thwarted incentive: Reduces the incentive for companies to actually reduce the harm they (might) cause
    • If you’re a company and you can get away with labeling your product as safe/green/humane, which gets you the benefit of consumer approval and a lack of hate, you don’t need to put in extra work to actually make your work safe/green/humane.
  5. And more?

What can (and should) we do about this?

Some things that come to mind: 

  1. To counteract confusion, we can try to be more specific in explanations about “safety” or “humane conditions” or use more specific terms like “existential safety”
  2. To counteract our own confusion, we could encourage (even) more distillation of content and external validation of work
  3. Stare into the abyss about the possibility that our work is not useful (or is harmful), and seek external reviews and criticism
  4. We could also create or support standards for safety or external validation systems (like Certified Humane), and evaluate projects against that (e.g.) (although versions of this might be gameable, and we should beware new “standards” for the usual reasons).
  5. Call out safety-washing (and other kinds of washing).
  6. Call out organizations doing things that are bad on their merits, and be clear about why what they showcase as safety-oriented work (or efforts to be more humane, etc.) insufficiently address the risks and harms of their work.

How important or promising is all of this as an approach or a type of work to focus on? I’m not sure — I’d guess that it’s not the most valuable thing to focus on for most people, but would be interested in other people’s thoughts. My main motivation for writing this was that I think the phenomenon of safety-washing exists and will become more prominent, and we should keep an eye out for it. 

[Edit: this paper has some relevant discussion.]

Image credit: Dall-e.

I'm a bit swamped and may not respond to comments, but will probably read them and will be very grateful for them (including for corrections and disagreements!). 

"Safety-washing" might also be spelled "safetywashing." I don't know which is better or more common, and have gone with the former here.

  1. ^

    After I wrote a draft of this post, I noticed that there was a very similar post on LessWrong. I should have checked earlier, but I’m posting this anyway as it is slightly different (and somewhat more detailed) and because some Forum users may not have seen the LW version.

  2. ^

    Here are some resources you can explore on this topic if you want to learn more: onetwothreefourfivesixseven.

  3. ^

    Safety isn’t the only thing that people care about, in terms of ethical concerns about AI, and it’s probably not the most popular concern. I’m focusing on safety in this post. Other concerns have been discussed in e.g. Forbes: Forbes discusses AI Ethics washing (paywalled) — “AI Ethics washing entails giving lip service or window dressing to claimed caring concerns about AI Ethics precepts, including at times not only failing to especially abide by Ethical AI approaches but even going so far as to subvert or undercut AI Ethics approaches.” I only skimmed the article but it seems to focus on self-driving cars as its motivating example. It also separates “washers” into four groups; those who wash by ignorance, by good-motivations-stretched-or-slipped, by stretching-the-truth or spinning it, and those who brazenly lie. It also describes “Ethics Theatre” — making a big show of your ethics work, “Ethics shopping” — picking the guidelines that are easiest to adopt, “Ethics bashing” — e.g. insisting the guidelines are worthless or a cover-up, “Ethics Shielding” — I didn’t quite follow this one, “Ethics Fairwashing” — specifically focusing on claims that an AI is fair when it isn’t.

  4. ^

    If you think that AI risk is not miniscule, then being safe (even if it means being slow) is also in your interests — see this section of “Let’s think about slowing down AI.”  But maybe you think safety concerns are overblown, and you’re just viewing safety efforts as appeasement of the risk-concerned crowd. Or you have myopic incentives, etc. In that case, you might think that being safe just slows you down and wastes your resources. 

Sorted by Click to highlight new comments since:

I fully agree that we should expect corporations to engage in safety-washing, as merely marketing yourself as X is always gonna be cheaper than actually doing X, whatever X moral thing is. 

However there is a key difference between greenwashing/humanewashing and safety-washing, and that is that we don't know what the correct approach to safety is. We can actually look at the carbon emissions of a company or how they treat their animals, but it's very hard to look at a company and objectively say they're "doing it wrong". 

Take one of your examples here:

  • Talk about “safety” when you really mean other kinds of things the public might want an AI to be, like un-biased and not-hateful

I would argue, and plenty of others have made this point before, that making your AI un-biased and not-hateful is actually highly relevant to AI safety. If you're trying to make an AI not-harmful in the future, it seems fairly important to make it not-harmful now

This makes me concerned that the term "safety-washing" will simply be used as a bludgeon against anyone who doesn't agree with your personal opinion of the best safety approach.

This is a good point, thanks! I don't want more bludgeons-to-be-used-in-disagreements.

[Writing very quickly.]

Although I might push back against the implied extent to which we know what the correct approaches to humane food or the climate are. 

And while I agree with "If you're trying to make an AI not-harmful in the future, it seems fairly important to make it not-harmful now," I do think that there are different things that happen, and they're worth distinguishing:

  1. "I want to make sure that my work on AI doesn't end up killing everyone, and part of that is to learn how to make sure the systems I'm developing don't say anything sexist" (which seems like the position you're arguing for)
  2. "I genuinely think that it's really important to make sure that my AI work doesn't lead to increased sexism" (which seems very good but also very different from mitigating existential risk from AI, except accidentally)
  3. "People are worried about sexist AI systems, and also about safety, and honestly AI safety seems really hard, but I do know of some things that I could do on the sexism fronts, so I'm going to focus on that kind of AI 'safety'" (which seems like the type of thing that would cause confusion and potential harm)

Yes, perhaps I'm just injecting some of my broader concerns about who gets to use the word "safety" here. 

I'm thinking of scenario 4 here:

4. A researcher looked into AI risk, and is convinced that AI could be highly dangerous if misused, and that "misalignment" is a serious problem that could lead to some very bad outcomes, from increased sexism to significant amount of deaths from say misaligned weapon systems or healthcare diagnosis. However, they think the arguments for existential risk are very flimsy, that any x-risk threat is very far away, and that it's legitimately a waste of time to work on x-risk. So they focus their team on preventing near and medium term danger from existing AI systems. 


In case it wasn't obvious, the opinion of the researcher is the one I hold, and I know a significant amount of others do as well (some perhaps secretly). I don't think it's wrong for them to claim they are working on "AI safety", when they are literally working on making AI safer, but it seems like they would be open to accusations of safewashing if they made that claim. 

I like your suggestion of using the phrase "existential safety" instead, I think it would clear up a lot of confusion. 

I think what's happened with Google/Deepmind and OpenAI/Microsoft has been much worse than safety washing. In effect it's been "existential safety washing"! The EA and AI x-risk communities have been far too placated by the existence of x-safety teams at these big AGI capabilities companies. I think at this point we need to be trying other things, like pushing for a moratorium on AGI development.

The history of the Fair Trade movement -- while very different to AI Safety in a whole bunch of ways -- provides weak additional evidence in favour of trying to heavily restrict (and closely monitoring) the say that AI labs themselves get in safety-related decision-making.

I say this because involvement of major companies in Fair Trade seems to have led to a weakening of existing standards and a proliferation of (weak) certification schemes.

(Note, my overall guess is that the labs very much need to be involved in safety-related decision-making and it would be counterproductive to try to shut them out entirely. But at the very least, this should be a warning sign that you need to take proactive steps to prevent co-option.)


Here are some other relevant "strategic implications" from my report on the movement -- there's detail on the reasoning and evidence for each in the report itself. 

  • Engaging directly with mainstream market institutions and dynamics will enable a social movement to influence consumer behavior much more rapidly than efforts to build “alternative” supply chains.
  • Engaging directly with mainstream market institutions and dynamics may lead to co-option and a lowering of standards.
  • Social movements should implement strategies to resist pressure from private sector businesses to weaken the standards of certified products.
  • Social movements should seek to minimize the number of competing certification schemes.
  • Relative, flexible certifications may be especially susceptible to downward pressure.
  • International standards may be seen as more credible than local or national standards.

Note, I think the analogy might hold if we replace "mainstream market institutions" with "leading for-profit companies" or some such.

In the analogies of types of washing we should include Altruism-Washing.  

This is a special case of ethics washing, a concept that has been discussed in the AI context. Great minds think alike! 🙂

More from Lizka
Curated and popular this week
Relevant opportunities