Introduction
When a system is made safer, its users may be willing to offset at least some of the safety improvement by using it more dangerously. A seminal example is that, according to Peltzman (1975), drivers largely compensated for improvements in car safety at the time by driving more dangerously. The phenomenon in general is therefore sometimes known as the “Peltzman Effect”, though it is more often known as “risk compensation”.[1] One domain in which risk compensation has been studied relatively carefully is NASCAR (Sobel and Nesbit, 2007; Pope and Tollison, 2010), where, apparently, the evidence for a large compensation effect is especially strong.[2]
In principle, more dangerous usage can partially, fully, or more than fully offset the extent to which the system has been made safer holding usage fixed. Making a system safer thus has an ambiguous effect on the probability of an accident, after its users change their behavior.
There’s no reason why risk compensation shouldn’t apply in the existential risk domain, and we arguably have examples in which it has. For example, reinforcement learning from human feedback (RLHF) makes AI more reliable, all else equal; so it may be making some AI labs comfortable releasing more capable, and so maybe more dangerous, models than they would release otherwise.[3]
Yet risk compensation per se appears to have gotten relatively little formal, public attention in the existential risk community so far. There has been informal discussion of the issue: e.g. risk compensation in the AI risk domain is discussed by Guest et al. (2023), who call it “the dangerous valley problem”. There is also a cluster of papers and works in progress by Robert Trager, Allan Dafoe, Nick Emery-Xu, Mckay Jensen, and others, including these two and some not yet public but largely summarized here, exploring the issue formally in models with multiple competing firms. In a sense what they do goes well beyond this post, but as far as I’m aware none of t
Lizka - thanks for sharing this.
I'm struck by one big 'human subjects' issue with the ethics of OpenAI and deployment of new GPT versions: there seems to be no formal 'human subjects' oversight of this massive behavioral experiment, even though it is gathering interactive, detailed, personal data from over 100 million users, with the goal of creating generalizable knowledge (in the form of deep learning parameters, ML insights, & human factors insights).
As an academic working in an American university, if I wanted to run a behavioral sciences experiment on as few as 10 or 100 subjects, and gather generalizable information about their behavior, I'd need to get formal Institutional Review Board (IRB) approval to do that, through a well-established system of independent review that weights scientific and social benefits of the research against the risks and costs for participants and for society.
On the other hand, OpenAI (and other US-based AI companies) seem to think it's perfectly fine to gather interactive, detailed, identified (non-anonymous) data from over 100 million users, without any oversight. Insofar as they've ever received any federal research money (e.g. from NSF or DARPA), this could arguably be a violation of federal code 45 CFR 46 regarding protection of human subjects.
The human subjects issues might be exacerbated by the fact that GPT users are often sharing private biomedical information (e.g. asking questions about specific diseases, health concerns, or test results they have), and it's not clear whether OpenAI has the systems in place to adequately protect this private health information, as mandated under the HIPAA rules.
It's interesting that the OpenAI 'system card' on GPT-4 lists many potential safety issues, but seems not to mention these human subjects/IRB compliance issues at all, as far as I can see.
For example, there is no real 'informed consent' process for people signing up to use Chat GPT. An honest consent procedure would include potential users reading some pretty serious cautions such as 'The data you provide will help OpenAI develop more powerful AI systems that could make your job obsolete, that could be used to develop mass customized propaganda, that could exacerbate economic inequality, and that could impose existential risks on our entire species. If you agree to these terms, please click 'I agree'....
So, we're in a situation where OpenAI is running one of the largest-scale behavioral experiments ever conducted on our species, collecting gigabytes of personal information from users around the world, with the goal of distilling this information into generalizable knowledge, but seems to be entirely ignoring the human subjects protection regulations mandated by the US federal government.
EA includes a lot of experts on moral philosophy and moral psychology. Even setting aside the US federal regulatory issues, I wonder what you all think about the research ethics of GPT deployment to the general public, without any informed consent or debriefing??
I wonder why performance on AP English Literature and AP English Language stalled
Scroll down to page 82. No spoilers.
Also, I've noticed that MacAskill's book in bibliography - but just as a general reference I would say. Haven't spotted any other major philosophical works.
Regarding info hazards, there's YouTubers in the AI YT community that read it out to their tens of thousands of followers. As with a lot lately, the cat's out the bag
I was considering downvoting, but after looking at that page maybe it's good not to have it copy-pasted
For people reading these comments and wondering if they should go look: it's in the section that compares early and launch responses of GPT-4 for "harmful content" prompts. It is indeed fairly full of explicit and potentially triggering content.
Ok, I should have been clear in the beginning - what struck me was that the first example was essentially answering the question on doing great harm with minimum spendings - a really wicked "evil EA", I would say. I found it somewhat ironic.
EM, Effective Malevolence
Did you intend to refer to page 83 rather than 82?
I see it's indeed page 83 in the document on arxiv; it was 82 in the pdf on OpenAI website
Is it a pure coincidence that 3 prominent LLMs are announced on the same day?
Naively, maybe they each thought Pi day (March 14th) would get them more attention? I'd guess it's most likely a coincidence given how many big releases there have been recently, but would be amusing if it was Pi day related.
The other alternative was that there was some coordination about releasing LLM. Plenty of people argue that they somehow should coordinate, so it would not be surprising if they actually did it.
Particular ChatGPT failure mode that I am wondering if GPT-4 passes: routing questions (the ones I tried "can I drive from Boston to Portland, Maine without passing through New Hampshire", "I want to look at the Arctic Ocean from behind my windshield. Can I do this?" ChatGPT was able to answer both <1/10 times). Anyone with access want to try this?
I don't have GPT-4 access, but I was curious how GPT-3.5 got the first question wrong. I just tried it:
It gets the bottom line correct, but the details are completely wrong.
3.5 will be reasonably well distributed between the wrong answer (no) and the right answer + a routing that passes directly through NH. My single Poe GPT-4 fell into the second category.
There's the claim that GPT-4 is better at not going off the guardrails and that Bing runs on GPT-4. How does that fit together with Bing's behavior?
I think it's referring to the version of GPT-4 with RLHF, which I believe Bing/Sydney doesn't have? Bing/Sydney being based on the pre-trained version or the fine-tuned version, most likely.