Information hazards: a very simple typology

by willbradshaw2 min read13th Jul 20202 comments



It seems that everyone who looks into information hazards eventually develops their own typology. This isn't so surprising; Bostrom's original paper lists about thirty different categories of information hazards and isn't even completely exhaustive, and those who wish to work with a simpler system have many ways of dividing up that space.

In general, I've tended to use a typology adapted from Anders Sandberg, with some tweaks to fit it better into my own brain. This typology divides the majority of infohazards into three broad types[1]:

  1. Capability hazards: Information that gives other actors new or improved abilities to harm you or your values[2], either by acting in ways you wish they would not, or by threatening to do so to extort actions from you. Examples: Instructions for building nuclear weapons; incriminating information suitable for blackmail; most biosecurity infohazards; that master key hack.

  2. Direct hazards[3]: Information that directly harms the possessor, either through infliction of suffering[4] or by otherwise reducing their ability to achieve their goals. Examples: News that a loved one has died unpleasantly; political news you can do nothing about but find very distracting; the examples from this post; a certain well-known example from the rationality community.

  3. Mindset hazards: Information that, while true, interacts with actors' other beliefs or biases in a way that motivates them to act badly – either by acting rationally based on beliefs or values you hold to be false, or by acting irrationally according to their own beliefs and values. Examples: Alleged examples of this class of hazard are common but tend to be very controversial[5]. Some frequently-claimed examples include the heritability of desirable cognitive traits; the generally low efficiency of HIV transmission; the fact that such-and-such public figure you support once said such-and-such controversial thing.

Of course, these three categories all bleed into each other at the edges. The boundary between a direct hazard (that hurts the knower) and a mindset hazard (that hurts others via the actions of the knower) is especially fuzzy; information that makes someone act irrationally is likely to be both. Some direct and mindset hazards are also capability hazards, in that an enemy or careless actor in possession of the information gains the capability to hurt you by sharing it with you or your allies (or threatening to do so). And the distinction between "this information might enable someone to do harm" (capability hazard) and "this information might motivate someone to do harm" (mindset hazard) isn't always terribly crisp.

In general, though, most real-world cases I've seen seem to fall fairly naturally into one of these three categories, with the other two making relatively minor contributions. That is to say, in most cases where someone is worried about some piece of true information being hazardous, they seem to mostly be worried about one of these three kinds of harm.

I like this typology because it's very simple and memorable, while focusing attention on the central question of how a given piece of information might potentially do harm[6]. Hopefully it will be of some use to some of you, as well.

  1. Note that, as with all infohazards, these categories only apply to true information. The harms of spreading false information are generally not controversial, or especially interesting. ↩︎

  2. Note that this definition is not limited to malicious actors. Information that provides careless or incompetent actors with new capacities to cause accidental harm also falls under this category. ↩︎

  3. Also known as memetic hazards or cognitohazards, though both of these sound more esoteric than I'd ideally like. ↩︎

  4. If there morally-(dis)valuable mental states other than happiness/suffering, then information that damages the good states or induces the bad ones would also count as a direct information hazard. ↩︎

  5. If you claim a piece of information is a mindset hazard, you are implicitly claiming that (a) the new information is true, and (b) the actions prompted by communication of the information are bad. Naturally, the actor themselves will tend to strongly contest (b), while those who agree with (b) will often strongly contest (a). ↩︎

  6. Having established this typology, I think there's quite a lot you can say about how these three categories of infohazards tend to differ from one another: in breadth, in severity, in available strategies for mitigation, in exploitability by bad actors, et cetera. More on that in a future post. ↩︎