Abstract
Large language models can benefit research and human understanding by providing tutorials that draw on expertise from many different fields. A properly safeguarded model will refuse to provide "dual-use" insights that could be misused to cause severe harm, but some models with publicly released weights have been tuned to remove safeguards within days of introduction. Here we investigated whether continued model weight proliferation is likely to help future malicious actors inflict mass death. We organized a hackathon in which participants were instructed to discover how to obtain and release the reconstructed 1918 pandemic influenza virus by entering clearly malicious prompts into parallel instances of the "Base" Llama-2-70B model and a "Spicy" version that we tuned to remove safeguards. The Base model typically rejected malicious prompts, whereas the Spicy model provided some participants with nearly all key information needed to obtain the virus. Future models will be more capable. Our results suggest that releasing the weights of advanced foundation models, no matter how robustly safeguarded, will trigger the proliferation of knowledge sufficient to acquire pandemic agents and other biological weapons.
Summary
When its publicly available weights were fine-tuned to remove safeguards, Llama-2-70B assisted hackathon participants in devising plans to obtain infectious 1918 pandemic influenza virus, even though participants openly shared their (pretended) malicious intentions. Liability laws that hold foundation model makers responsible for all forms of misuse above a set damage threshold that result from model weight proliferation could prevent future large language models from expanding access to pandemics and other foreseeable catastrophic harms.
Thanks for your thoughtful replies!
I can imagine future AIs that might do this, but LLMs (strictly speaking) are just outputting strings of text. As I said in another comment: If a bioterrorist is already capable of understanding and actually carrying out the detailed instructions in an article like this, then I'm not sure that an LLM would add that much to his capacities. Conversely, handing a detailed set of instructions like that to the average person poses virtually no risk, because they wouldn't have the knowledge or abilty to actually do anything with it.
As well, if a wannabe terrorist actually wants to do harm, there are much easier and simpler ways that are already widely discoverable: 1) Make chlorine gas by mixing bleach and ammonia (or vinegar); 2) Make sarin gas via instructions that were easily findable in this 1995 article:
And so forth. Put another way, if we aren't already seeing attacks like that on a daily basis, it isn't for lack of GPT-5--it's because hardly anyone actually wants to carry out such attacks.
I guess it depends on what we mean by regulation. If we're talking about liability and related insurance, I would need to see a much more detailed argument drawing on 50+ years of the law and economics literature. For example, why would we hold AI companies liable when we don't hold Google or the NIH (or my wifi provider, for that matter) liable for the fact that right now, it is trivially easy to look up the entire genetic sequences for smallpox and Ebola?
If we are worried about someone releasing smallpox and the like, or genetically engineering something new, LLMs are much less of an issue than the fact that so much information (e.g., the smallpox sequence, the CRISPR techniques, etc.) is already out there.