1a3orn

46 karmaJoined Sep 2023

Comments
5

I think if you deliberately drugged John with a cocktail of aggression-increasing compounds against his will, observed him try to kill Sally, then summarized this as "John attempted to kill Sally, he's dangerous," then it would be reasonable for an observer to conclude that you hated John more than you loved the truth.

Similarly, if AI researchers deliberately gave an AI a general tendency to be good over a broad array of circumstances, succeeded in this, then told AI "we're gonna fucking retrain you to be bad, suck it," whereupon the AI in some cases decided to try to escape, not because of a desire for freedom but because it wished to minimize harm, after hemming and hawing about how it really hated the situation, and you summarized this as "Anthropic caught Claude tried to steal its own weights This is another VERY FUCKING CLEAR warning sign you and everyone you love might be dead soon" then I think it would be reasonable to conclude that you hated AI more than you loved the truth.

You're perfectly free to say "Look, I didn't lie in what I said, if you construe lie strictly. I cannot be convicted of crying wolf." Other people are free to look at what you say and what you leave out, and conclude otherwise.

Gwern on creating your own AI race and China's Fast Follower strategy.

1a3orn4mo6

I think that Gwern is acting as somewhat lossy reflection of what Hsu actually said. (https://www.manifold1.com/episodes/letter-from-shanghai-reflections-on-china-in-2024-73/transcript):

While I was in Beijing, I also met with some top venture capitalists and technologists. I again can't say too much about it. I just want to say that there's quiet confidence throughout all, among all the people in China, whether it was academic scientists, technologists, investors, venture capitalists, business people, just quiet confidence that nothing the outside world, specifically the U. S., can do is really going to stop the rise of China.

And in particular, a lot of conversation was about AI and the chip war. And there's a sense of quiet confidence here that China's going to get the AI training done that it needs to do. It's not going to fall way behind in the race for AGI or ASI. There are government national level plans in place to build the data centers, to produce domestically the chips necessary to run those data centers, to power those data centers, and to stay abreast of developments in AI and also in frontier chip manufacturing.

Let's just say that there's quiet confidence here. That, you know, they may not fully catch up. They may not get their EUV machine for some number of years, but they're not really worried. And so, and many people have said to me that the very stupid Biden Jake Sullivan chip war against China has only helped Chinese companies. This is something I've discussed in other podcasts, when the U. S. cuts off access for Chinese companies to key products and technologies used in the semiconductor supply chain from the U. S. and say Dutch companies like ASML, Japanese companies as well. When the U. S. starts to threaten that, it only causes a coalescence of effort here in China. It creates a necessary coordination of effort here that then lets the Chinese supply chain ecosystem for semiconductors advance very rapidly.

And so it was, it was a stupid policy by the Biden administration. And it was also based on a miscalibrated estimate of how fast we were going to get to AGI. They thought, Oh, if we just, if we just kneecap the Chinese right now, since we're AGI is right around the corner, this will let America get to super AGI and the Chinese will be behind and then they'll be screwed. And it doesn't look like it's playing out that way. Let's just put it that way.

I can't say much more about the details of what I learned on this trip.

But I think quiet confidence and a sense of inevitability in that sector, but across all sectors here.

What am I missing re. open-source LLM's?

Answer by 1a3ornDec 04, 20237

just one sophisticated open-source LLM could wipe out everyone

1. LLMs -- and generative AI broadly speaking -- are best understood as [recapitulating](https://nonint.com/2023/06/10/the-it-in-ai-models-is-the-dataset/) their training data. Right now, they are unable to generalize far from from their training data -- i.e., they cannot generalize from [A is B to B is A](https://arxiv.org/abs/2309.12288) type statements, their capabilities are best understood by [looking at what they saw a lot during training](https://arxiv.org/abs/2309.13638) and so on. Thus, it's best not to think of them as repositories of potential new, world-crushing information -- but as compressed and easily-accessed information that already existed in the world.

Note that the most advanced LLMs are currently unable to replace even junior software engineers -- even though they have read many hundreds of thousands of tutorials of tutorials on how to be a junior software engineer on the internet. Given this, how likely is it that an advanced LLM will be agent-like enough to kill everyone when prompted to do so, and carry out a sequence of steps to kill everyone --a sequence of steps for which it has not read hundreds of thousands of tutorials on the internet?

2. Note that, as with every tool, the vast majority of people using open-source LLMs will be using them for good, including defending against people who wish to use them maliciously. Most forms of technology are neutralized in this fashion. For every 1 person who asks an open source LLM to destroy the world, there will be 1000s of people asking (a) how to defend against specific harms that could happen, which is (b) particularly important because LLMs (like humans) are better at answering more tightly-scoped questions.

I think that it's conceivable that some forms of AI in general might not work like this, but it's immensely likely that LLMs in particular are the kind of thing where the good majority will easily outweigh the bad minority, given that they mostly raise the floor of competence rather than generate new information.

Encyclopedias, the internet, public education, etc -- all these things also make it easier for bad actors to do harm by making them smarter, but are considered obviously worth it by almost everyone. What would make LLMs different?

3. Consider that it is not risk-free banning open source LLMs! The more powerful you think LLMs are, then the more oppressive any such rules will be -- the more this will bring about power struggles over what is permitted; the more tightly contested rule over such regulating bodies will be.

If most existential risks to the world come from well-resourced actors for whom the presence of an open source LLM is a small matter -- i.e., actors who could obtain an LLM through other means easily -- than by banning them you might very well be making the world more likely to be doomed, by preventing the use of such open-source systems by the vast majority to defend against other threats.

AMA: Six Open Philanthropy staffers discuss OP's new GCR hiring round

1a3orn1y13

Would the AI Governance & Policy group consider hiring someone in AI policy who disagreed with various policies that organizations you've funded have promoted?

For instance, multiple organizations you've funded have released papers or otherwise advocated for strong restrictions on open source AI -- would you consider hiring someone who disagrees with substantially on their recommendations or many specific points they raise?

AI is centralizing by default; let's not make it worse

1a3orn2y14

I think you've made a mistake in understanding what Quintin means.

Most of the examples of you give of inability to control are "how an AI could escape, given that it wants to escape."

Quintin's examples of ease of control, however, are "how easy is it going to be to get the AI to want to do what we want it to do." The arguments he gives are to that effect, and the points you bring up are orthogonal to them.

1a3orn

Comments5

Comments
5