Status: some mix of common wisdom (that bears repeating in our particular context), and another deeper point that I mostly failed to communicate.

Short version

Harmful people often lack explicit malicious intent. It’s worth deploying your social or community defenses against them anyway. I recommend focusing less on intent and more on patterns of harm.

(Credit to my explicit articulation of this idea goes in large part to Aella, and also in part to Oliver Habryka.)

Long version

A few times now, I have been part of a community reeling from apparent bad behavior from one of its own. In the two most dramatic cases, the communities seemed pretty split on the question of whether the actor had ill intent.

A recent and very public case was the one of Sam Bankman-Fried, where many seem interested in the question of Sam's mental state vis-a-vis EA. (I recall seeing this in the responses to Kelsey's interview, but haven't done the virtuous thing of digging up links.)

It seems to me that local theories of Sam's mental state cluster along lines very roughly like (these are phrased somewhat hyperbolically):

  1. Sam was explicitly malicious. He was intentionally using the EA movement for the purpose of status and reputation-laundering, while personally enriching himself. If you could read his mind, you would see him making conscious plans to extract resources from people he thought of as ignorant fools, in terminology that would clearly relinquish all his claims to sympathy from the audience. If there were a camera, he would have turned to it and said "I'm going to exploit these EAs for everything they're worth."
  2. Sam was committed to doing good. He may have been ruthless and exploitative towards various individuals in pursuit of his utilitarian goals, but he did not intentionally set out to commit fraud. He didn't conceptualize his actions as exploitative. He tried to make money while providing risky financial assets to the masses, and foolishly disregarded regulations, and may have committed technical crimes, but he was trying to do good, and to put the resources he earned thereby towards doing even more good.

One hypothesis I have for why people care so much about some distinction like this is that humans have social/mental modes for dealing with people who are explicitly malicious towards them, who are explicitly faking cordiality in attempts to extract some resource. And these are pretty different from their modes of dealing with someone who's merely being reckless or foolish. So they care a lot about the mental state behind the act.

(As an example, various crimes legally require mens rea, lit. “guilty mind”, in order to be criminal. Humans care about this stuff enough to bake it into their legal codes.)

A third theory of Sam’s mental state that I have—that I credit in part to Oliver Habryka—is that reality just doesn’t cleanly classify into either maliciousness or negligence.

On this theory, most people who are in effect trying to exploit resources from your community, won't be explicitly malicious, not even in the privacy of their own minds. (Perhaps because the content of one’s own mind is just not all that private; humans are in fact pretty good at inferring intent from a bunch of subtle signals.) Someone who could be exploiting your community, will often act so as to exploit your community, while internally telling themselves lots of stories where what they're doing is justified and fine.

Those stories might include significant cognitive distortion, delusion, recklessness, and/or negligence, and some perfectly reasonable explanations that just don't quite fit together with the other perfectly reasonable explanations they have in other contexts. They might be aware of some of their flaws, and explicitly acknowledge those flaws as things they have to work on. They might be legitimately internally motivated by good intent, even as they wander down the incentive landscape towards the resources you can provide them. They can sub- or semi-consciously mold their inner workings in ways that avoid tripping your malice-detectors, while still managing to exploit you.

And, well, there’s mild versions of the above paragraph that apply to almost everyone, and I’m not sure how to sharpen it. (Who among us doesn’t subconsciously follow incentives, and live under the influence of some self-serving blind spots?)

But in the cases that dramatically blow up, the warp was strong enough to create a variety of advance warning signs that are obvious to hindsight. But also, yeah, it’s a matter of degree. I don’t think there’s a big qualitative divide, that would be stark and apparent if you could listen in on private thoughts.

People do sometimes encounter adversaries who are explicitly malicious towards them. (For a particularly stark example, consider an enemy spy during wartime.) Spies and traitors and turncoats are real phenomena. Sometimes, the person you're interacting with really is treating you as a device that they're trying to extract information or money from; explicit conscious thoughts about this are really what you'd hear if you could read their mind.

I also think that that's not what most of the bad actors in a given community are going to look like. It's easy, and perhaps comfortable, to say "they were just exploiting this community for access to young vulnerable partners" or "they were just exploiting this community for the purpose of reputation laundering" or whatever. But in real life, I bet that if you read their mind, the answer would be far messier, and look much more like they were making various good-faith efforts to live by the values that your community professes.

I think it's important to acknowledge that fact, and build community processes that can deal with bad actors anyway. (Which is a point that I attribute in large part to Aella.)

There's an analogy between the point I'm making here, and the one that Scott Alexander makes in The Media Very Rarely Lies*. Occasionally the media will literally fabricate stories, but usually not.

If our model is that there's a clear divide between people who are literally fabricating and people who are "merely" twisting words and bending truths, and that we mostly just have to worry about the former, then we'll miss most of the harm done. (And we’re likely to end up applying a double standard to misleading reporting done by our allies vs. our enemies, since we’re more inclined to ascribe bad intentions to our enemies.)

There's some temptation to claim that the truth-benders have crossed the bright red line into "lying", so that we can deploy the stronger mental defenses that we use against "liars".

But... that's not quite right; they aren't usually crossing that bright red line, and the places where they do cross that line aren’t necessarily the places where they’re misleading people the most. If you tell people to look out for the bright red line then you'll fail to sensitize them to the actual dangers that they're likely to face. The correct response is to start deploying stronger defenses against people who merely bend the truth.

(Despite the fact that lots of people bend the truth sometimes, like when their mom asks them if they’ve stopped dating blue-eyed people yet while implicitly threatening to feel a bunch of emotional pain if they haven’t, and they technically aren’t dating anyone right now (but of course they’d still date blue-eyed people given the opportunity) so they say “yes”. Which still counts as bending the truth! And differs only by a matter of degree! But which does not deserve a strong community response!)

(Though people do sometimes just make shit up, as is a separate harsh lesson.)

I think there's something similar going on with community bad actors. It's tempting to imagine that the local bad actors crossed bright red lines, and somehow hid that fact from everybody along the way; that they were mustache-twirling villains who were intentionally exploiting you while cackling about it in the depths of their mind. If that were true, it would activate a bunch of psychological and social defense mechanisms that communities often try to use to guard against bad actors.

But... historically, I think our bad actors didn't cross those bright red lines in a convenient fashion. And I think we need to be deploying the stronger community defenses anyway.

I don't really know how to do that (without causing a bunch of collateral damage from false positives, while not even necessarily averting false negatives much). But I hereby make a bid for focusing less on whether somebody is intentionally malicious.

I suggest minting a new word, for people who have the effects of malicious behavior, whether it's intentional or not. People who, if you step back and look at them, seem to leave a trail of misery in their wake, or a history of recklessness, or a pattern of negligence.

It's maybe fun to debate about whether they had mens rea, and the courts might care about the mens rea after it all blows up, but from our perspective, the main question is what behaviors they’re likely to engage in, and there turn out to be many really bad behaviors that don’t require malice at all.

I don't have any terminological suggestions that I love. My top idea so far is to repurpose the old word "malefactor" for someone who has a pattern of ill effects, regardless of their intent. (This in contrast with "enemy", which implies explicit ill intent.)

And for lack of a better word, I’ll suggest the word “maleficence” to describe the not-necessarily-malevolent mental state of a malefactor.

I think we should basically treat discussions about whether someone is malicious as recreation (when they do not explicitly have documentation of being a literal spy/traitor/etc., nor identify as an enemy), and I think that maleficence is what matters when deploying community (or personal) defense mechanisms.


Sorted by Click to highlight new comments since: Today at 1:52 AM

I agree that we should mostly care about effects and not give someone a pass on iffy behavior because they seem to have good intentions. But there are still good reasons for people in the community to care about intentions beyond "fun to debate":

  • If someone with good intentions ends up doing harm, or had good intentions but then became corrupted in trying to do good, we want to learn from how they went wrong so others in the community can avoid falling into similar traps. Whereas if they had poor intentions from the start lessons are much less generalizable.

  • In handling an individual case where someone is showing warning signs, figuring out how they tick can be helpful. Someone who badly puts good intentions into practice may be helped (perhaps they're missing an important consideration, need a warning, etc) while someone with poor intentions should probably just be kicked out. Of course you can't always tell, and interacting with people as if they have good intentions (while being firm about problems) is generally a good approach, but I do think there's something here.

On the LW cross-post Richard Korzekwa has a similar comment that gives a detailed example.


(I agree; thanks for the nuance)

Good post. There's a lot of psych research on 'person perception', 'trait attribution', and moral psychology. A key takeaway is that people intuitively care a lot about intent, and make strong distinctions between willful harming-of-others, and accidental harming-of-others.

We also, arguably, have a set of 'psychopath-detection' adaptations for identifying, managing, and ostracizing those who habitually and recklessly impose harm on others.

But none of that really helps clarify the SBF situation very much, as you point out.

I think a key thing is often overlooked in EA commentary on the SBF/EA situation: wealth, power, fame, and influence can corrupt people, very quickly, beyond all recognition. We tend to over-estimate continuity of personal identity and stability of traits in cases where people's circumstances change dramatically. 

SBF with a net worth under $100k versus SBF with a net worth of over $20bn might have been very different people with different values, priorities, moral guardrails, self-deceptions, and biases. 

There are many, many examples of young people achieving sudden wealth and fame, and turning into much worse people who are barely recognizable to former friends and family members -- especially if they surround themselves with fawning entourages who encourage short-term hedonism and impulsive risk-taking over long-term altruism and wise caution. They often turn from great people with great talents into narcissistic malefactors who think they're exempt from ordinary moral norms (and legal constraints).

Whenever an EA-affiliated person achieves that kind of sudden wealth and fame, we should be especially careful to deploy our social defense mechanisms, with the expectation that they're much more likely to become malefactors than we ever expected.


I agree with this post. Given that we react more leniently towards selfish behavior when people appear to have good intentions, it seems clear to me that we are incentivizing everyone to convince themselves that they really do have good intentions. Regardless of whether they’re actually doing morally good behavior.

I don’t think this is isolated to “having good intentions.” I think it affects many other internal states and ways-of-seeing-oneself. e.g. If we give people more slack for missing deadlines when they have ADHD, that incentivizes people to convince themselves they have ADHD (whether or not they truly meet the criteria). If we give people whatever they want when they throw a tantrum, that incentivizes them to avoid learning how to regulate their emotions.

(Of course, all of these examples involve trade-offs, and the answer isn’t clearly “stop giving any special treatment towards people who have some sympathetic internal state.”)

There’s a related concept in medicine called “secondary gain.” Basically, a patient may be subconsciously motivated to stay sick because their illness resulted in some indirect benefit, e.g. their spouse started helping more with housework.