So much talk in AI and EA seems to be about AI alignment. But of course, even if we build aligned AI, if it's open-sourced, then a bad actor could tweak the AI to do catastrophic harm to all of us. Isn't this possibility even worse than if AI is concentrated in the hands of a few individuals, even with bad intentions (assuming those intentions are greed, but not outright sociopathy)? I know Tristan Harris has talked about this dichotomy, and there's the Bostrom Vulnerable Worlds Hypothesis, but I don't see how there's any getting around the fact that just one sophisticated open-source LLM could wipe out everyone, and that would be worse than anything possible with a closed-source AI. But I'd love to hear the alternative arguments (i.e., that's why I'm posting), so please do share!

New Answer
New Comment

2 Answers sorted by

  1. Yep, AI safety people tend to oppose sharing model weights for future dangerous AI systems.
  2. But it's not certain that (operator-aligned) open-source powerful AI entails doom. To a first approximation, it entails doom iff "offense" is much more efficient than "defense," which depends on context. But absent super monitoring to make sure that others aren't making weapons/nanobots/whatever, or super efficient defenses against such attacks, I intuit that offense is heavily favored.

just one sophisticated open-source LLM could wipe out everyone


1. LLMs -- and generative AI broadly speaking -- are best understood as [recapitulating](https://nonint.com/2023/06/10/the-it-in-ai-models-is-the-dataset/) their training data. Right now, they are unable to generalize far from from their training data -- i.e., they cannot generalize from [A is B to B is A](https://arxiv.org/abs/2309.12288) type statements, their capabilities are best understood by [looking at what they saw a lot during training](https://arxiv.org/abs/2309.13638) and so on. Thus, it's best not to think of them as repositories of potential new, world-crushing information -- but as compressed and easily-accessed information that already existed in the world.

Note that the most advanced LLMs are currently unable to replace even junior software engineers -- even though they have read many hundreds of thousands of tutorials of tutorials on how to be a junior software engineer on the internet.  Given this, how likely is it that an advanced LLM will be agent-like enough to kill everyone when prompted to do so, and carry out a sequence of steps to kill everyone --a sequence of steps for which it has not read hundreds of thousands of tutorials on the internet?

2.  Note that, as with every tool, the vast majority of people using open-source LLMs will be using them for good, including defending against people who wish to use them maliciously. Most forms of technology are neutralized in this fashion. For every 1 person who asks an open source LLM to destroy the world, there will be 1000s of people asking (a) how to defend against specific harms that could happen, which is (b) particularly important because LLMs (like humans) are better at answering more tightly-scoped questions.

I think that it's conceivable that some forms of AI in general might not work like this, but it's immensely likely that LLMs in particular are the kind of thing where the good majority will easily outweigh the bad minority, given that they mostly raise the floor of competence rather than generate new information.

Encyclopedias, the internet, public education, etc -- all these things also make it easier for bad actors to do harm by making them smarter, but are considered obviously worth it by almost everyone. What would make LLMs different?

3. Consider that it is not risk-free banning open source LLMs! The more powerful you think LLMs are, then the more oppressive any such rules will be -- the more this will bring about power struggles over what is permitted; the more tightly contested rule over such regulating bodies will be. 

If most existential risks to the world come from well-resourced actors for whom the presence of an open source LLM is a small matter -- i.e., actors who could obtain an LLM through other means easily -- than by banning them you might very well be making the world more likely to be doomed, by preventing the use of such open-source systems by the vast majority to defend against other threats.