Introduction
When a system is made safer, its users may be willing to offset at least some of the safety improvement by using it more dangerously. A seminal example is that, according to Peltzman (1975), drivers largely compensated for improvements in car safety at the time by driving more dangerously. The phenomenon in general is therefore sometimes known as the “Peltzman Effect”, though it is more often known as “risk compensation”.[1] One domain in which risk compensation has been studied relatively carefully is NASCAR (Sobel and Nesbit, 2007; Pope and Tollison, 2010), where, apparently, the evidence for a large compensation effect is especially strong.[2]
In principle, more dangerous usage can partially, fully, or more than fully offset the extent to which the system has been made safer holding usage fixed. Making a system safer thus has an ambiguous effect on the probability of an accident, after its users change their behavior.
There’s no reason why risk compensation shouldn’t apply in the existential risk domain, and we arguably have examples in which it has. For example, reinforcement learning from human feedback (RLHF) makes AI more reliable, all else equal; so it may be making some AI labs comfortable releasing more capable, and so maybe more dangerous, models than they would release otherwise.[3]
Yet risk compensation per se appears to have gotten relatively little formal, public attention in the existential risk community so far. There has been informal discussion of the issue: e.g. risk compensation in the AI risk domain is discussed by Guest et al. (2023), who call it “the dangerous valley problem”. There is also a cluster of papers and works in progress by Robert Trager, Allan Dafoe, Nick Emery-Xu, Mckay Jensen, and others, including these two and some not yet public but largely summarized here, exploring the issue formally in models with multiple competing firms. In a sense what they do goes well beyond this post, but as far as I’m aware none of t
In Debiasing Decisions: Improved Decision Making With A Single Training Intervention they found that a 30-minute video reduced confirmation bias, fundamental attribution error, and bias blind spot, by 19%.
The video is super cheesy, and that makes me suspicious.
It should be noted that playing a 60-minute "debiasing" game debiased people more than the video.
The rest of this short form is random thoughts about debiasing.
I tried finding tests for these biases so that I can do it myself, but I didn't find any. This made me worry that we don't have standardized tests for biases, which strikes me as bad. Although I didn't spend too much time looking into it. (More on this here)
I don't think training people to reduce 3 biases a time is a good way to go, since we have 100s of biases. If we use a taxonomy of biases like Arkes (1991) (strategy-based, association-based, and psychophysical errors). maybe we could have three interventions for each type of bias? But it's not clear how you would teach people to avoid say association-based biases by lecturing about it.
You could nudge them in small ways. From Arkes (1991)
In Sedlmeier & Gigerenzer they taught people Bayes by using frequencies rather than probabilities. E,g. Instead of saying (1% of people use drugs and they test positive 80% of the time while non-users 5% of the time), you say From 1000 people, 10 use drugs, 8 drug users test positive, while 50 non-users test positive).
It seems to work.
If it's really hard, we should target really bad, really harmful biases.
From here
Perhaps finding out which are the worst biases, and what are the best interventions for them are would be useful. But increasing the effectiveness of changing beliefs is potentially dangerous, so maybe not.
I think you're wise to point out the potential risk of increasing the effectiveness of changing others' beliefs. Like any technology or technique, when we consider whether to contribute to its development, we have to consider both the potential harm it could do in the wrong hands and the potential good it could do in the right ones. I'm not sure enough people in the education and debiasing communities realize that.