This article uses a thought experiment to probe at dangers of public online communication, then describes an ontology around “malicious supporters” and possible remedies.

Epistemic & Scholarly Status
I know of a fair amount of popular literature around “toxic communities”, and some on “malicious supporters”. The main value of this piece is not in introducing new insights, but rather in giving more formal names to existing ones. I think Contrapoints does a good job of discussing related recent issues. This piece was written quite quickly and personally.

This is all quite speculative and the terminology is early. Any thoughts and feedback well appreciated. 

[Trigger warning: violence caused from online discussion]

Malicious Online Actors and Censorship

Say you’re having a conversation on Twitter, and you get an anonymous message,

“Hi there. I just wanted to let you know that I find discussions of porridge breakfasts to be deeply angering. If you ever mention porridge breakfasts, I’m going to attack a person at random."

 For the sake of the thought experiment, you believe them and don’t think you can stop them.

You might legally have free speech, and you could definitely argue that you shouldn’t be responsible for some crazed internet persona, but at the same time, now any online discussion of porridge breakfasts will counterfactually result in someone being attacked. If you are a consequentialist, you probably should avoid discussing porridge breakfasts.

Perhaps you could be creative. If you’re confident the person wouldn’t mind you hinting at porridge breakfasts, then you could find some alternative language and use that. Now your communication is still effectively censored due to some individual random crazy person, but the damage is minimized.

Things could get worse if this person had an agenda. They realize they have power over you. Once they notice that you self-censor porridge breakfasts, they move on to cereal as well, and soon enough tomato and cheese sandwiches for lunch. Soon enough the only foods you can discuss are a bizarre list of outdated Turkish specialities.

There are a few stances one could start to take.

Always Censor
In this case, no one gets harmed, but eventually your communication is severely limited.

Never Censor
In this case, someone definitely gets harmed, but you could communicate as much as you want.

Selectively Censor
Here you try to learn much more about the motivations of the aggressors. In some cases the expected value of self-censoring seems to outweigh the costs (this person really just cares about porridge breakfasts and nothing else), and in some cases it doesn’t (this person is trying to censor all discussion of Democratic values, and that’s too much to lose.)

Malicious Supporters

Things arguably get more complicated when the aggressor thinks of themself as being on your side. If you say “I like porridge”, they’ll go and digitally attack someone who says bad things about porridge. Now, whenever you try to openly point out a critique of some food critic, they’ll go out and try to digitally attack that person. Let’s call these aggressors “malicious supporters”.

What can you do if you wake up one day to find that you have a large swatch of malicious supporters? You have a few options:

  1. You Always Censor. You stop talking about anything these supports could use for bad ends.
  2. You attempt to dismantle the harmful parts of the community, before speaking any more about food. You take action to learn about these individuals, try to talk to them, bring in the police or similar, and carry out a sophisticated campaign. This could massively backfire though; these people might be supporting you in other ways, and it’s possible they could then turn on you. Fight Club style.
  3. You gradually change or improve the community while doing minor self censorship. You could modify your communication style to discourage the most toxic supporters from continuing to follow you. Perhaps you also attempt to encourage safe norms to those supporters who will accept them. (I’d note that this option is really the combination of two other options, but a better ontology here would take a while and likely be less readable).
  4. You Never Censor. You ignore them and try not to let them impact your judgement. It’s your responsibility to speak truth, not bend it due to some random people you didn’t intentionally educate. You might notice that for some reason food critics start being incredibly nice to you, but you don’t think much about it.
  5. You strategically use these supporters to advance your ends. If anyone complains when they notice a string of harassed food critics following your tweets, you could always fall back on the fact that you didn’t personally commit the crimes nor did you technically request them. You should probably sound oblivious to the connection.

Option 1 typically isn’t really feasible for most people with malicious supporters. Often by the time they acquire these supporters, they are several steps into a career that requires saying things publicly. Self censorship would require a full career transition. Very few people are comfortable with substantial career transitions.

Option 2 is insanely difficult. It could happen gradually, and in small steps. If one were to start doing this publicly in the midst of others not doing it publicly, onlookers would wonder what’s’ going on. “Person X sure spends a lot of time talking about their toxic community. They must have a really toxic community, I’m staying away from them.” Or worse, “Person X acknowledged that their speech actually led to someone being attacked. How could Person X have been so reckless?”

Option 3 can be a decent middle ground to aim for, but it can be quite challenging to change the norms of a large group of supporters. There will likely be some harm done, but you gave it “reasonable” effort.

Option 4 and 5 seem fairly common though deeply unfortunate. It’s generally not very pleasant to be in a world where those who are listened to routinely select options 4 and 5.

Aside on Definitions:
Malicious Supports vs. Toxic Communities
There is a fair amount of discussion online about “Toxic Communities”, but I’d like to draw attention to malicious supporters for a few reasons.

  1. Communities are diverse. It’s possible someone could generally have a highly respectful online community while still having a few malicious supporters.
  2. Malicious supporters don’t always fit in neatly with communities. There could be instances of 2 deeply malevolent supporters for some odd topic without any real community.

I’m quite unsure about the word “malicious” here and am happy to consider or recommend alternatives.

Provoking Malicious Supporters

Say someone takes one of options 3-5 above. This person writes a tweet about food issues, and then a little while later some food critic gets a threat. We can consider this act a sort of provocation of malicious supporters, even if it were unintentional.

Generally, I assume that in civilized societies, popular figures with malicious supporters will very rarely confess to purposefully provoking these supporters. But we can speculate on what they might have been thinking when they did this.

Intentional Provoking
"They knew what they were doing”
Recognizing that communication will lead to malevolent actions and doing it anyway. Later lying about it in a rather straightforward manner. Also see “Dog Whistling.”

Unconscious Provoking
Just because someone isn’t committing explicit and intentional harm doesn’t mean that one is wholly innocent. Often incentives the lurk below the surface show themselves in actions but hide in conscious realization. There’s a wide spectrum here between “sort of knew what was going on, but never investigated” vs. “actually was that ignorant,” vs. “was raised in a culture that promoted antisocial behaviors without individuals ever realizing it.” See Elephant in the Brain for more here. 

Accidental Provoking
It can be very tough to cognitively empathize with anonymous online individuals. These individuals could be highly erratic in their behavior. So even someone very smart, introspective, and honest may wind up provoking people to do bad things on select occasions.

This closely mirrors legal discussions of negligence, gross neglect, and malice. The distinction is important for reward systems and regulation. If you only punish intentional malice, then companies will stop leaving paper trails and push harmful actions into amorphous feedback mechanisms. Leaders will become purposefully ignorant, either consciously, unconsciously, or due to selection effects (the worst ones will rise to the top without recognizing why.)

Alternatives to Maliciousness Supporters

I used “malicious supporters” above as a simple example, but this was a simplification for the sake of explanation. A few things that complicate this:

“Possible” Maliciousness Supporters
Even if one doesn’t have any malicious supporters, the possibility that they might could be enough to do damage. The author described above writes something nasty about a different food critic who does a quick read of the authors’ Twitter followers and comments. Some of them seem like they could be malicious. Perhaps they should dramatically enhance their personal security measures to be safe. The author could use this “possible threat” intentionally.

“Accidental” Maliciousness
Many people that cause malice seem to rationalize it or not recognize it. They could be really moral people, just deeply incorrect ones in ways that cause ongoing damage.

“Second Level” Followers
One might not have any followers who are directly malicious, but some of these followers might have their own malicious followers.

Future Supporters
The supporters don’t need to exist now in order to be damaging. Arguably Karl Marx was a pretty well intentioned fellow, but some of his ideas were deeply misused much later on. Future supporters seem like they would be more difficult to benefit from than existing ones, but they should still be things to watch out for by well meaning authors.

Misaligened Supporters
Some supporters might not exactly be malicious, but they might be misaligned in other ways. Often readers dramatically misunderstand even simple communication. Conspiracy thoerists might take words as evidence of crazy things, and then go on to encourage destructive government actions based on that. It’s generally incredibly challenging to align one romantic couple, let alone an amorphous and anonymous online community.

Community Level Solutions

Say you have a community with several communicators with possibly maligned supporters. What should you do?

Communicator Norms
Recognizing that maligned supporters are a legitimate problem and then training communicators on how to responsibly communicate given this could provide a lot of value. As this is done, it should be clear to people involved what kinds of behaviors lead to likely provocations, and these behaviors could be correspondingly identified and discouraged.

Community Enforcement Mechanisms
As rules get codified, they could be enforced in routine procedures. This could both serve to punish intentionally malicious communicators, and protect innocent and accidentally malicious ones.

Community Management
It would be valuable to get information on possible misaligned supporters. One significant challenge to this is that doing a comprehensive job can be at conflict with expectations of privacy.

One could imagine one setup where every single possible supporter is mapped out in a database with corresponding scores on several dimensions to describe possible problems, conditional on various topics or communication items being discussed. Key aggregated metrics from this database would be made public, so that anyone could tell that they don’t need to be worried about a community in question.

This would seem insane to communicator readers.
I just want to follow this person on Twitter, what’s all this stuff about me submitting personal information on my bio and criminal history?

Surveys could be posted, though of course it could be quite difficult to get the most malicious or paranoid people to fill these out, let alone admit possible maliciousness.


This all seems really messy and complicated.

“I’m just writing restaurant reviews, do I need to start figuring out a balance of privacy of my Twitter followers before generating some custom reporting system and online dashboard?”

“All this talk of the dangers of communication is going to lead to more people just self censoring and important communication being lost.”

“Any discussion of this topic is really just ammunition in preparation of future fights to push for censorship.”

“Here goes Ozzie again, writing some long post with no links or real examples and coming up with a bunch of annoying terminology we’re all supposed to use.”

I’m not at all an expert in this area. Take this as some rough ideas from me spending a few hours on the topic, after a longer time considering various strands. I think it’s an important topic and do hope to encourage intelligent discussion on it. Feel free to copy these ideas, not cite this, and write it up in a much better post with your preferred terminology.

Hopefully this post doesn’t spawn its’ own supporters who use this terminology for malicious ends (though I would give them points for irony.)


10 comments, sorted by Click to highlight new comments since: Today at 5:04 PM
New Comment

Thanks, I think this is interesting and these sorts of considerations may become increasingly important as EA grows. One other strategy that I think is worth pursuing is preventative measures. IMHO, ideally EA would be the kind of community that selectively repels people likely to be malicious (eg I think it's good if we repel people who are generally fueled by anger, people who are particularly loud and annoying, people who are racist, etc). I think we already do a pretty good job of "smacking down" people who are very brash or insulting to other members, and I think the epistemic norms in the community probably also select somewhat against people who are particularly angry or who have a tendency to engage in ad hominem. Might also be worth considering what other traits we want to select for/against, and what sort of norms we could adopt towards those ends.

Agreed on preventative measures, where possible. I imagine preventative measures are probably more cost-effective than measures after the fact.

Without rising to the level of maliciousness, I've noticed a related pattern to ones you describe here where sometimes my writing attracts supporters who don't really understand my point and whose statements of support I would not endorse because they misunderstand the ideas. They are easy to tolerate because they say nice things and may come to my defense against people who disagree with me, but much like with your many flavors of malicious supporters they can ultimately have negative effects.

I thought this was interesting and well written. Thanks for posting it, and thanks especially for writing it in a way which (at least in my view) was balanced and fair.

I'm unsure about whether the specific example you used was a wise choice. I see the merits to making this abstract enough that it doesn't feel like an attack on people with some particular set of political views (and I think you succeeded on this front), however I found some parts a little hard to follow, and found myself struggling to remember and then having to check whether people were pro-porridge or not. I don't know whether it would have been possible to use a more "realistic" example without compromosing the neutrality I praised at the start however, so maybe this was best.

Thanks so much for the feedback. 

On the example; I wrote this fairly quickly. I think the example is quite mediocre and the writing of the whole piece was fairly rough. If I were to give myself a grade on writing quality for simplicity or understandability, it would be a C or so. (This is about what I was aiming for given the investment). 

I'd be interested in seeing further writing that uses more intuitive and true examples. 

Can you expand a bit about the relevance of this to EA? Do you think that better open communication is a worthy cause by itself, or that this has relevance to infohazard policies, or perhaps something else?

Very fair question. I'm particularly considering the issue for community discussions around EA. There's a fair EA Twitter presence now and I think we're starting to see some negative impacts of this. (Especially around hot issues like social justice.)  

I was considering posting here or LessWrong and thought that the community here is typically more engaged with other public online discussion.

That said, if someone has ideas to address the issue on a larger scale, I could imagine that being an interesting area. (Communication as a cause area)

I myself am doing a broad survey of things useful for collective epistemics, so this would also fall within that.

As a datapoint, the issues Ozzie raises feel quite relevant to issues I find myself needing to think about where it comes to different communities engaging with Xrisk-related issues and different aspects of our (an xrisk/gcr centre's) work - especially when it comes to different communities with different epistemic and communication norms - so I find it relevant and helpful in that sense.

Thanks, I found this post to be quite clear and a helpful addition to the conversation.

Thanks for letting me know, that's really valuable.