Meta’s frontier AI models are fundamentally unsafe. Since Meta AI has released the model weights publicly, any safety measures can be removed. Before it releases even more advanced models – which will have more dangerous capabilities – we call on Meta to take responsible release seriously and stop irreversible proliferation. Join us for a peaceful protest at Meta’s office in San Francisco at 250 Howard St at 4pm PT.

RSVP on Facebook[1] or through this form.

Let’s send a message to Meta:

  • Stop irreversible proliferation of model weights. Meta’s models are not safe if anyone can remove the safety measures.
  • Take AI risks seriously.
  • Take responsibility for harms caused by your AIs.

All you need to bring is yourself and a sign, if you want to make your own. I will lead a trip to SF from Berkeley but anyone can join at the location. We will have a sign-making party before the demonstration-- stay tuned for details. We'll go out for drinks afterward 🙂

  1. ^

    I like the irony.

114

2
1

Reactions

2
1
Comments32
Sorted by Click to highlight new comments since: Today at 10:15 AM

Protests are by nature adversarial and high-variance actions prone to creating backlash, so I think that if you're going to be organizing them, you need to be careful to actually convey the right message (and in particular, way more careful than you need to be in non-adversarial environments—e.g. if news media pick up on this, they're likely going to twist your words). I don't think this post is very careful on that axis. In particular, two things I think are important to change:

"Meta’s frontier AI models are fundamentally unsafe."

I disagree; the current models are not dangerous on anywhere near the level that most AI safety people are concerned about. Since "current models are not dangerous yet" is one of the main objections people have to prioritizing AI safety, it seems really important to be clearer about what you mean by "safe" so that it doesn't sound like the protest is about language models saying bad things, etc.

Suggestion: be very clear that you're protesting the policy that Meta has of releasing model weights because of future capabilities that models could have, rather than the previous decisions they made of releasing model weights.

"Stop free-riding on the goodwill of the open-source community. Llama models are not and have never been open source, says the Open Source Initiative."

This basically just seems like a grab-bag accusation... you're accusing them of not being open-source enough? That's the exact opposite of the other objections; I think it's both quite disingenuous and also a plausible way things might backfire (e.g. if this is the one phrase that the headlines run with).

It's not obvious to me that message precision is more important for public activism than in other contexts. I think it might be less important, in fact. Here's why:

My guess is that the distinction between "X company's frontier AI models are unsafe" vs. "X company's policy on frontier models is unsafe" isn't actually registered by the vast majority of the public (many such cases!). Instead, both messages basically amount to a mental model that is something like "X company's AI work = bad" And that's really all the nuance that you need to create public pressure for X company to do something. Then, in more strategic contexts like legislative work and corporate outreach, message precision becomes more important. (When I worked in animal advocacy, we had a lot of success campaigning for nuanced policies with protests that had much vaguer messaging).

Also, I don't think the news media is "likely" going to twist an activist's words. It's always a risk, but in general, the media seems to have a really healthy appetite for criticizing tech companies and isn't trying to work against activists here. If anything, not mentioning the dangers of the current models (which do exist) might lead to media backlash of the "X-risk is a distraction" sort. So I really don't think Holly saying "Meta’s frontier AI models are fundamentally unsafe" is evidence of a lack of careful consideration re: messaging here.

I do agree with the Open Source issue though. In that case, it seems like the message isn't just imprecise, but instead pointing in the wrong direction altogether.

I think the distinctions Richard highlights are essential for us to make in our public advocacy—in particular, polls show that there's already a significant chunk of voters who seem persuadable by AI notkilleveryoneism, so it's a good time to argue for that directly. I don't think there's anything gained by hiding under the banner of fearing moderate harms from abuse of today's models, and there's much to be lost if we get policy responses that protect us from those but not from the actual x-risk.

I'm also heartened by recent polling, and spend a lot of time time these days thinking about how to argue for the importance of existential risks from artificial intelligence.

I'm guessing the main difference in our perspective here is that you see including existing harms in public messaging as "hiding under the banner" of another issue. In my mind, (1) existing harms are closely related to the threat models for existential risks (i.e. how do we get these systems to do the things we want and not do the other things); and (2) I think it's just really important for advocates to try to build coalitions between different interest groups with shared instrumental goals (e.g. building voter support for AI regulation). I've seen a lot of social movements devolve into factionalism, and I see the early stages of that happening in AI safety, which I think is a real shame.

Like, one thing that would really help the safety situation is if frontier models were treated like nuclear power plants and couldn't just be deployed at a single company's whim without meeting a laundry list of safety criteria (both because of the direct effects of the safety criteria, and because such criteria literally just buys us some time). If it is the case that X-risk interest groups can build power and increase the chance of passing legislation by allying with others who want to include (totally legitimate) harms like respecting intellectual property in that list of criteria, I don't see that as hiding under another's banner. I see it as building strategic partnerships.

Anyway, this all goes a bit further than the point I was making in my initial comment, which is that I think the public isn't very sensitive to subtle differences in messaging — and that's okay because those subtle differences are much more important when you are drafting legislation compared to generally building public pressure.

Suggestion: be very clear that you're protesting the policy that Meta has of releasing model weights because of future capabilities that models could have, rather than the previous decisions they made of releasing model weights.

They are both unsafe now for the things they can be used for and releasing model weights in the future will be more unsafe because of things the model could do.

> This basically just seems like a grab-bag accusation... you're accusing them of not being open-source enough?

It's more like people think "open source" is good because of the history of open source software, but this is a pretty different thing. The linked article describes how model weights are not software and Meta's ToS are arguably anti-competitive, which undermines any claim to just wanting to share tools and accelerate progress. 

"They are both unsafe now for the things they can be used for and releasing model weights in the future will be more unsafe because of things the model could do."

I think using "unsafe" in a very broad way like this is misleading overall and generally makes the AI safety community look like miscalibrated alarmists. I do not want to end up in a position where, in 5 or 10 years' time, policy proposals aimed at reducing existential risk come with 5 or 10 years worth of baggage in the form of previous claims about model harms that have turned out to be false. I expect that the direct effects of the Llama models that have been released so far will be net positive by a significant margin (for all the standard reasons that open source stuff is net positive). Maybe you disagree with this, but a) it seems better to focus on the more important claim, for which there's a consensus in the field, and b) even if you're going to make both claims, using the same word ("unsafe") in these two very different senses is effectively a motte and bailey.

It's more like people think "open source" is good because of the history of open source software, but this is a pretty different thing. The linked article describes how model weights are not software and Meta's ToS are arguably anti-competitive, which undermines any claim to just wanting to share tools and accelerate progress. 

The policy you are suggesting is far further away from "open source" than this is. It is totally reasonable for Meta to claim that doing something closer to open source has some proportion of the benefits of full open source.

The policy you are suggesting is far further away from "open source" than this is. It is totally reasonable for Meta to claim that doing something closer to open source has some proportion of the benefits of full open source.

Suppose meta was claiming that their models were curing cancer. It probably is the case that their work is more likely to cure cancer than if they took Holly's preferred policy, but nonetheless it feels legitimate to object to them generating goodwill by claiming to cure cancer.

In your hypothetical, if Meta says “OK you win, you're right, we'll henceforth take steps to actually cure cancer”, onlookers would assume that this is a sensible response, i.e. that Meta is responding appropriately to the complaint. If the protester then gets back on the news the following week and says “no no no this is making things even worse”, I think onlookers would be very confused and say “what the heck is wrong with that protester?”

It is a confusing point, maybe too subtle for a protest. I am learning!

It was a difficult point to make and we ended up removing it where we could.

This is a good point and feels persuasive, thanks!

I think using "unsafe" in a very broad way like this is misleading overall and generally makes the AI safety community look like miscalibrated alarmists.

I agree that when there's no memetic fitness/calibration trade-off, it's always better to be calibrated. But here there is a trade-off. How should we take it?

  1. My sense is that there's never been any epistemically calibrated social movement and so that it would be playing against odds to impose that constraint. Even someone like Henry Spira who was very thoughtful personally used very unnuanced communication to achieve social change. 
  2. Richard, do you think that being miscalibrated has hurt or benefited the ability of past movements to cause social change? E.g. climate change and animal welfare. 

    My impression is that probably not? They caused entire chunks of society to be miscalibrated on climate change (maybe less in the US but in Europe it's pretty big), and that's not good, but I would guess that the alarmism helped them succeed? 
    As long as there also exists a moderate faction & and there still exists background debates on the object-level, I feel like having a standard social activism movement wd be overall very welcome.

Curious if anyone here knows the relevant literature on the topic, e.g. details in the radical flank literature. 

How much do you anticipate protests characterizing the AI Safety community, and why is that important to you?

The analogy here would be climate scientists and climate protesters. Afaik climate protesters have not delegitimised climate scientists or made them seem like miscalibrated alarmists (perhaps even the opposite).

Linch
7mo58
4
0
1
4
1

(speaking in a personal capacity) I currently plan to go. I'm willing to be talked out of it. Some meta[1]-level reasons:

  • I think I'm too biased towards balance and nuance etc. Sometimes it's good to to call bad things bad.
  • I think it's helpful to have more "skin in the game" and not just be abstract and cerebral all the time, or be involved in overly meta activities like grantmaking, fundraising, and EA Forum comments.
  • I'm pretty uncertain whether this specific protest is net good or bad; though I think most healthy movements or causes have a public protest/public action component as well, so this is a reasonable guess to the portfolio approach.
  • Compared to most people in positions like mine, I think I'm less likely to get industry jobs in AI companies, or AI policy jobs[2] in the US gov't, going forwards.
    • So I'm relatively burning less career capital from doing this than most people in my position.
    • There's a bit of "if not me, then who?" comparative advantage angle here.
  • Like most EAs, some of my funding likely indirectly comes from Meta, and I think it's worthwhile to try to control for or counteract this bias.
  1. ^

    lol

  2. ^

    wrong ethnicity/nationality

Awesome, heartening to see this, thanks Linch!

I intend to go even though I'm vaguely against trying to stop AI development right now. I think it's true that:

  1. The benefit/harm ratio for open sourcing AI seems much different than traditional software, and I don't think a heuristic of "open sourcing is always better" is reasonable.
  2. If you have access to the model weights, then you can relatively easily bypass the safety measures. I don't think the Llama models are dangerous right now, but it is good to push back against the idea that making a model "safe" is simply a matter of making the original weights safe.
  3. If Meta continues to open source their models, then this will eventually enable terrorists to use the models to asymmetrically harm the world. I think a single bio-terrorist attack would likely reverse all the positive gains from Meta open sourcing their models.
  4. Meta's chief AI scientist, Yann LeCun, mostly has terrible arguments against taking AI safety seriously.

What would it take for a model to be capable of supporting bioterrorism?  Or simply to get consistently useful results, similar to human research scientists, in technical domains.

The LLM model is f(fuzzy map of human text, prompt) = [distribution of tokens proportional the probabilities a human might emit the token].

You might assume this would have median human intelligence but since "errors are noisy" while "correct answers repeat again and again", an LLM emitting the most probable token is somewhat above median intelligence at tasks that can be reflected this way.

This does not necessarily scale to [complex technical situations that are not necessarily available in text form and require vision, smell, and touch as well as fine robotic control], [actions that evolve the state of a laboratory towards an end goal].  

It seems like you would need actual training data from actual labs, right?  So as long as Meta doesn't actually train on that kind of information, the model won't be better at helping a bioterrorist with their technical challenges than google?  Or am I badly wrong somewhere?

Great to see you getting behind this Matthew. Being against the open sourcing of frontier AI is something that we can have a broader coalition on.

Out of the four major AI companies, three of them seem to be actively trying to build God-level AGI as-fast-as-possible. And none of them are Meta. To paraphrase Conner Leahy, watch the hands, not the mouth. Three of them talk about safety concerns, but actively pursue a reckless agenda. One of them dismisses safety concerns, but seems to lag behind the others, and is not currently moving at breakneck speed. I think the general anti-Meta narrative in EA seems to be because the three other AI companies have used EAs for their own benefit (poaching talent, resources, etc.) I do not think Meta has yet warranted being a target.

Cool idea!

What's your theory of change for this protest? For example, do we know for a fact that there are AI researchers working out of that office?

It seems to me like your theory of change should inform your actions a fair amount. For example, if your goal is to change the thinking of Meta employees, actions like the following could make sense:

  • During the sign-making party, do some roleplays where someone pretends to be a Meta employee and asks the sort of questions a Meta employee might ask.

  • People who seem good at answering Meta employee questions could write something like "Ask me questions!" on their sign. (Also, people with relevant credentials could write that on their sign, e.g. "I have an AI PhD".)

  • Try to look friendly and inviting during the protest. Since the protest is happening around the time employees get off work anyways, have a goal of getting at least one Meta employee embroiled in conversation to join you for drinks after the protest. Maybe by spacing protestors out a little -- you could make it so even if they don't talk to the first protestor they see, they think for a bit and then talk to the 2nd or 3rd.

It seems like most protests are fairly confrontational. I'm not sure if that's just the nature of politics, or because confrontational protests work well according to some particular theory of change. I think I would favor a less-confrontational strategy just because you can always switch to being confrontational later. (Also: "In the qualitative responses about the readings, there were some recurring criticisms, including: a desire to hear from AI researchers, a dislike of philosophical approaches, a dislike of a focus on existential risks or an emphasis on fears, a desire to be “realistic” and not “speculative”, and a desire for empirical evidence." Source. I wonder if Bing Chat would be a good case study to highlight?)

It seems like the sign-making strategy is fairly different if you want to do a confrontational protest. E.g. for a non-confrontational protest, I imagine that funny or inviting signs could be good (perhaps something like: "Only you can save us from irreversible AI proliferation!") For a confrontational protest, I imagine it's better to have accusatory signs, plus bring a bunch of extra signs and try to get passerby to join you.

BTW, a mental model that might describe what's going on at Meta: It's like an online echo chamber. Researchers at Meta self-select for being unconcerned about safety. Once they're hired, they participate in internal discussions (probably in private Facebook groups) where the consensus is that AI risk fears are alarmist. Best-case outcome for a protest is inspiring a few people to speak up in favor of safety during internal discussions. (If you do manage to have a conversation with a Meta person, you could ask them if this mental model is accurate!)

PS -- Another possible theory of change for protests is: use conversations to refine messaging that actually persuades AI risk skeptics, then broadcast that messaging later. Relevant post

This is a peaceful and conversational protest, and we timed it to be able to speak with employees as they leave work. Matter of opinion and the person whether it’s confrontational. It’s a criticism so it might be perceived as confrontational no matter what. We’ll just be there on a public sidewalk sharing our views.

  • I suggest to spend a few minutes pondering what to do if crazy people (perhaps just walking by) decide to "join" the protest. Y'know, SF gonna SF.

  • FYI at a firm I used to work at, once there was a group protesting us out front. Management sent an email that day suggesting that people leave out a side door. So I did. I wasn't thinking too hard about it, and I don't know how many people at the firm overall did the same.

(I have no personal experience with protests, feel free to ignore.)

I'll be going to this. I just listened to your podcast with Daniel Filan and I thought your point about protests being underrated was a good one

I think the protests could benefit from "gears-level" detail on the sort of proximate effect as well as medium-long term effects:

  1. What literally do people do in a protest, how should they dress, speak, what impression should they induce in their audience and what is their audience?
  2. What could the medium or long term effects of the protest be? E.g. press, online discussion, awareness, sentiment

I think there is a good understanding here based on the comments from the OP below.  Note that this doesn't have to be airtight or 100% accurate. But this would help discourse and understanding.

I'm slightly worried/skeptical that the discourse around them isn't high-quality, e.g. many of these comments are speculative expressions of personal feelings.

I agree, it seems like there is a pretty big knowledge gap here on protests, more than I had thought. I’ll bump stirring a doc like this up in priority.

Maybe you could suggest a new Motto for them too.

"Meta AI: move fast and break everything" ;)

The decision to attend this particular protest is actually a difficult one. Normally, most EA-minded people consistently do not vote or attend protests, since whether or not the protest succeeds depends on the non-EA masses who don't do EV evaluations. Your decision not to attend predicts the decision of other EA-minded people not to attend, but does not predict the decision of non-EA people who almost entirely determine whether the protest/vote succeeds or fails.

However, with this specific protest, EA-minded people are the ones who almost entirely determine whether the protest succeeds or fails, because this is a protests by elites, against elites, and the general population is unwilling/unable to do EV calculations and will not attend either away. Therefore, your decision not to attend predicts whether this protest succeeds or fails. If a third of EA-affiliated people attend, then it actually probably intimidates Facebook quite a bit, whereas if it fails and only 12 people attend, then it might even embolden Facebook.

I'm someone who would normally not go to protests, because, in my own words, "that is obviously something that the world already has plenty of people doing", and many people affiliated with EA have an extremely similar knee-jerk response to public protests. But this situation is different.

I appreciate it! But I want to say that I think even 12 or less will be a success if we learn from it and get media attention. (Pause AI was covered in Wired, the Guardian, etc and each of their protests had less than 10 people.)

Good call, strategy of protest is far far more than numbers. I hope you are in contact with climate change and animal rights activists too, as they have a lot of experience in this area.

replaceability misses the point (with why EAs skew heavily on not liking protests). it's way more an epistemics issue-- messaging and advocacy are just deeply corrosive under any reasonable way of thinking about uncertainty. 

In my sordid past I did plenty of "finding the three people for nuanced logical mind-changing discussions amidst a dozens of 'hey hey ho ho outgroup has got to go'", so I'll do the same here (if I'm in town), but selection effects seem deeply worrying (for example, you could go down to the soup kitchen or punk music venue and recruit all the young volunteers who are constantly sneering about how gentrifying techbros are evil and can't coordinate on whether their "unabomber is actually based" argument is ironic or unironic, but you oughtn't. The fact that this is even a question, that if you have a "mass movement" theory of change you're constantly temped to lower your standards in this way, is so intrinsically risky that no one should be comfortable that ML safety or alignment is resorting to this sort of thing). 

Change log: I removed the point about llama 2 not being true open source because it was too confusing.

Curated and popular this week
Relevant opportunities