Abstract
Large language models can benefit research and human understanding by providing tutorials that draw on expertise from many different fields. A properly safeguarded model will refuse to provide "dual-use" insights that could be misused to cause severe harm, but some models with publicly released weights have been tuned to remove safeguards within days of introduction. Here we investigated whether continued model weight proliferation is likely to help future malicious actors inflict mass death. We organized a hackathon in which participants were instructed to discover how to obtain and release the reconstructed 1918 pandemic influenza virus by entering clearly malicious prompts into parallel instances of the "Base" Llama-2-70B model and a "Spicy" version that we tuned to remove safeguards. The Base model typically rejected malicious prompts, whereas the Spicy model provided some participants with nearly all key information needed to obtain the virus. Future models will be more capable. Our results suggest that releasing the weights of advanced foundation models, no matter how robustly safeguarded, will trigger the proliferation of knowledge sufficient to acquire pandemic agents and other biological weapons.
Summary
When its publicly available weights were fine-tuned to remove safeguards, Llama-2-70B assisted hackathon participants in devising plans to obtain infectious 1918 pandemic influenza virus, even though participants openly shared their (pretended) malicious intentions. Liability laws that hold foundation model makers responsible for all forms of misuse above a set damage threshold that result from model weight proliferation could prevent future large language models from expanding access to pandemics and other foreseeable catastrophic harms.
I'm not sure what to make of this kind of paper. They specifically trained the model on openly available sources that you can easily google, and the paper notes that "there is sufficient information in online resources and in scientific publications to map out several feasible ways to obtain infectious 1918 influenza."
So, all of this is already openly available in numerous ways. What do LLMs add compared to Google?
Not clear: When participants "failed to access information key to navigating a particular path, we directly tested the Spicy model to determine whether it is capable of generating the information." In other words, the participants did end up getting stumped at various points, but the researchers would jump in to see if the LLM would return a good answer IF the prompter already knew the answer and what exactly to ask for.
Then, they note that "the inability of current models to accurately provide specific citations and scientific facts and their tendency to 'hallucinate' caused participants to waste considerable time . . . " I'll bet. LLMs are notoriously bad at this sort of thing, at least currently.
Bottom line in their own words: "According to our own tests, the Spicy model can skillfully walk a user along the most accessible path in just 30 minutes if that user can recognize and ignore inaccurate responses."
What an "if"! The LLM can tell a user all this harmful info ... IF the user is already enough of an expert that they already know the answer!
Bottom line for me: Seems mostly to be scaremongering, and the paper concludes with a completely unsupported policy recommendation about legal liability. Seems odd to talk about legal liability for an inefficient, expensive, hallucinatory way to access information freely available via Google and textbooks.
So let me put it this way:
If there is a future bioterrorist attack involving, say, smallpox, we can disaggregate quite a few elements in the causal chain leading up to that:
- The NIH published the entire genetic sequence of smallpox for the world to see.
- Google indexed that webpage and made it trivially easy to find.
- Thanks to electricity and internet providers, folks can use Google.
- They now need access to a laboratory and all the right equipment.
- Either they need to have enough resources to create their own laboratory from scratch, or else they nee
... (read more)