Tyler Johnston

AI Safety Advocacy
813 karmaJoined Jul 2022Working (0-5 years)Tulsa, OK, USA



Book a 1:1 with me: https://cal.com/tylerjohnston/book

Share anonymous feedback with me: https://www.admonymous.co/tylerjohnston


I think the lesson we can draw from climate and animal rights that you mention - the radical flank effect - shows that extreme actions concerning an issue in general might make incremental change more palatable to the public. But I don’t think it shows that extreme action attacking incremental change makes that particular incremental change more likely.

If I had to guess, the analogue to this in the animal activist world would be groups like PETA raising awareness about the “scam” that is cage-free. I don’t think there’s any reason to think this has increased the likelihood of cage-free reforms taking place — in fact, my experience from advocating for cage-free tells me that it just worsened social myths that the reform was meaningless despite evidence showing it reduced total hours spent suffering by nearly 50%.

So, I would like to see an activist ecosystem where there are different groups with different tactics - and some who maybe never offer carrots. But directing the stick to incremental improvements seems to have gone badly in past movements, and I wouldn’t want to see the same mistake made here.

I think just letting the public now about AI lab leaders’ p(dooms)s makes sense - in fact, I think most AI researchers are on board with that too (they wouldn’t say these things on podcasts or live on stage if not).

It seems to me this campaign isn’t just meant to raise awareness of X-risk though — it’s meant to punish a particular AI lab for releasing what they see as an inadequate safety policy, and to generate public/legislative opposition to that policy.

I think the public should know about X-risk, but I worry using soundbites of it to generate reputatonial harms and counter labs’ safety agendas might make it less likely they speak about it in the future. It’s kind of like a repeated game: if the behavior you want in the coming years is safety-oriented, you should cooperate when your opponent exhibits that behavior. Only when they don’t should you defect.

Being mindful of the incentives created by pressure campaigns

I've spent the past few months trying to think about the whys and hows of large-scale public pressure campaigns (especially those targeting companies — of the sort that have been successful in animal advocacy).

A high-level view of these campaigns is that they use public awareness and corporate reputation as a lever to adjust corporate incentives. But making sure that you are adjusting the right incentives is more challenging than it seems. Ironically, I think this is closely connected to specification gaming: it's often easy to accidentally incentivize companies to do more to look better, rather than doing more to be better.

For example, an AI-focused campaign calling out RSPs recently began running ads that single out AI labs for speaking openly about existential risk (quoting leaders acknowledging that things could go catastrophically wrong). I can see why this is a "juicy" lever — most of the public would be pretty astonished/outraged to learn some of the beliefs that are held by AI researchers. But I'm not sure if pulling this lever is really incentivizing the right thing.

As far as I can tell, AI leaders speaking openly about existential risk is good. It won't solve anything in and of itself, but it's a start — it encourages legislators and the public to take the issue seriously. In general, I think it's worth praising this when it happens. I think the same is true of implementing safety policies like RSPs, whether or not such policies are sufficient in and of themselves.

If these things are used as ammunition to try to squeeze out stronger concessions, it might just incentivize the company to stop doing the good-but-inadequate thing (i.e. CEOs are less inclined to speak about the dangers of their product when it will be used as a soundbite in a campaign, and labs are probably less inclined to release good-but-inadequate safety policies when doing so creates more public backlash than they were facing before releasing the policy). It also risks directing public and legislative scrutiny to actors who actually do things like speak openly about (or simply believe in) existential risks, as opposed to those who don't.

So, what do you do when companies are making progress, but not enough? I'm not sure, but it seems like a careful balance of carrots and sticks.

For example, animal welfare campaigns are full of press releases like this: Mercy for Animals "commends" Popeye's for making a commitment to broiler welfare reforms. Spoiler alert: it probably wasn't written by someone who thought that Popeye's had totally absolved themselves of animal abuse with a single commitment, but rather it served as a strategic signal to the company and to their competitors (basically, "If you lead relative to your competitors on animal welfare, we'll give you carrots. If you don't, we'll give you the stick." If they had reacted by demanding more (which in my heart I may feel is appropriate), it would have sent a very different message: "We'll punish you even if you make progress." Even when it's justified [1], the incentives it creates can leave everybody worse off.

There are lots of other ways that I think campaigns can warp incentives in the wrong ways, but this one feels topical.

  1. Popeyes probably still does, in fact, have animal abuse in its supply chain ↩︎

My understanding is that screwworm eradication in North America has been treated by wild animal welfare researchers as a sort of paradigmatic example of what wild animal welfare interventions could look like, so I think it is on folks' radar. And, as Kevin mentions, it looks like Uruguay is working on this now with hopes of turning it into a regional campaign across South America.

I'm guessing one of the main reasons there hasn't been more uptake in promoting this idea is general uncertainty — both about the knock-on effects of something so large scale, and about whether saving the lives of animals who would have died from screwworm really results in higher net welfare for those animals (in many cases it's probably trading off an excruciating death now for a painful death later with added months or years of life in-between that may themselves be net-negative). So I do think it's a big overstatement for the guest to suggest that eradicating screwworm would be two orders of magnitude better than preventing the next 100 years of factory farming, which basically assumes that the wild animal lives saved directly trade-off (positively) against the (negative) lives of farmed animals.

@saulius might know more about this. One quote from a recent post of his: "To my surprise, most WAW researchers that I talked to agreed that we’re unlikely to find WAW interventions that could be as cost-effective as farmed animal welfare interventions within the next few years."

Hey Benjamin! Thank you so much for the very detailed response to what I now, upon reflection, realize was a pretty offhand comment on a topic that I'm definitely not an expert in. I've looked more into the IPCC report and the paper from Sherwood et al (which were really interesting) and this has been on the back of my mind for a while.

I definitely better understand what you are getting at in the sentence I quoted. But I will say that I'm still not convinced the wording is quite right. [1] I'll explain my reasoning below, but I also expect that I could be overlooking or misunderstanding key ideas.

As you explain, the IPCC report draws upon multiple lines of evidence when estimating climate sensitivity (process understanding, instrumental record, paleoclimates, and emergent constraints) and also makes a combined assessment drawing on each of these lines of evidence.

Since the climate system is so complex and our knowledge about it is limited, there are limits to how informative any individual line of evidence is. This gives us reason to add uncertainty to our estimates drawn from any individual line of evidence. Indeed, the authors address this when considering individual lines of evidence.

However, they decide that it is not necessary to add uncertainty to their combined assessment of equilibrium climate sensitivity (drawing from all of the lines of evidence together) since "it is neither probable that all lines of evidence assessed are collectively biased nor is the assessment sensitive to single lines of evidence."

This is a pretty narrow claim. They are basically saying that they feel the combined assessment of ECS in the Sixth IPCC report is robust enough (drawing from multiple separate lines of evidence that are unlikely to be collectively biased) that they don't need to account for unknown unknowns in framing it. [2] [3]

The combined assessment of ECS is only part of the full report. I worry it's an overstatement to make a general claim that "The IPCC’s Sixth Assessment Report attempts to account for structural uncertainty and unknown unknowns" (and in doing so vindicates low probability estimates of existential catastrophe from climate change), when in reality, the report only says that accounting for structural uncertainty isn't needed when framing a particular estimate (the combined assessment of ECS), which itself is only one component supporting the broader conclusions of the report, and the broader threat models from climate change.

What does accounting for unknown unknowns actually imply about whether "anthropogenic warming could heat the earth enough to cause complete civilisational collapse"?  My take here is that it should actually decrease our credence in what I take to be otherwise strong evidence suggesting that such a catastrophe looks extremely unlikely.

Toby Ord makes a similar argument in The Precipice, actually. I quote it below:

"When we combine the uncertainties about our direct emissions, the climate sensitivity[4] and the possibility of extreme feedbacks, we end up being able to say very little to constrain the amount of warming."


"The runaway and moist greenhouse effects remain the only known mechanisms through which climate change could directly cause our extinction or irrevocable collapse. This doesn't rule out unknown mechanisms. We are considering large changes to the Earth that may even be unprecedented in size or speed. It wouldn't be astonishing if that directly led to our permanent ruin."

I tend to agree that an existential catastrophe directly resulting from anthropogenic climate change is extremely unlikely, but I think accounting for unknown unknowns should make us less sure of that — and I don't think we can say that "even when we try to account for unknown unknowns, nothing in the IPCC’s report suggests that civilization will be destroyed" based only on the IPCC report claiming that their combined assessment of climate sensitivity is robust to unknown unknowns.

  1. I'm thinking in particular of "But even when we try to account for unknown unknowns, nothing in the IPCC’s report suggests that civilization will be destroyed" and "The IPCC’s Sixth Assessment Report, building on Sherwood et al.’s assessment of the Earth’s climate sensitivity attempts to account for structural uncertainty and unknown unknowns. Roughly, they find it’s unlikely that all the various lines of evidence are biased in just one direction — for every consideration that could increase warming, there are also considerations that could decrease it." ↩︎

  2. This is all my interpretation of the passage from the IPCC AR6 section 7.5, cited in the 80k article: "In the climate sciences, there are often good reasons to consider representing deep uncertainty, or what are sometimes referred to as ‘unknown unknowns’. This is natural in a field that considers a system that is both complex and at the same time challenging to observe. For instance, since emergent constraints represent a relatively new line of evidence, important feedback mechanisms may be biased in process-level understanding; pattern effects and aerosol cooling may be large; and paleo evidence inherently builds on indirect and incomplete evidence of past climate states, there certainly can be valid reasons to add uncertainty to the ranges assessed on individual lines of evidence. This has indeed been addressed throughout Sections 7.5.1–7.5.4. Since it is neither probable that all lines of evidence assessed here are collectively biased nor is the assessment sensitive to single lines of evidence, deep uncertainty is not considered as necessary to frame the combined assessment of ECS." ↩︎

  3. Also, I'm not sure if saying that they "account for unknown unknowns" is precisely what's going on here — rather they feel their combined assessment of ECS is so robust that they don't need to account for them. Maybe that is "accounting for them" in a very meta way. ↩︎

  4. Note that it's in estimating only climate sensitivity that (as far as I can tell) the IPCC's Sixth Assessment Report to make the claim that "unknowns mostly to cancel out, and we should be surprised if they point in one direction or the other" (quoted from the 80k article). ↩︎

I'm also heartened by recent polling, and spend a lot of time time these days thinking about how to argue for the importance of existential risks from artificial intelligence.

I'm guessing the main difference in our perspective here is that you see including existing harms in public messaging as "hiding under the banner" of another issue. In my mind, (1) existing harms are closely related to the threat models for existential risks (i.e. how do we get these systems to do the things we want and not do the other things); and (2) I think it's just really important for advocates to try to build coalitions between different interest groups with shared instrumental goals (e.g. building voter support for AI regulation). I've seen a lot of social movements devolve into factionalism, and I see the early stages of that happening in AI safety, which I think is a real shame.

Like, one thing that would really help the safety situation is if frontier models were treated like nuclear power plants and couldn't just be deployed at a single company's whim without meeting a laundry list of safety criteria (both because of the direct effects of the safety criteria, and because such criteria literally just buys us some time). If it is the case that X-risk interest groups can build power and increase the chance of passing legislation by allying with others who want to include (totally legitimate) harms like respecting intellectual property in that list of criteria, I don't see that as hiding under another's banner. I see it as building strategic partnerships.

Anyway, this all goes a bit further than the point I was making in my initial comment, which is that I think the public isn't very sensitive to subtle differences in messaging — and that's okay because those subtle differences are much more important when you are drafting legislation compared to generally building public pressure.

I appreciate you drawing attention to the downside risks of public advocacy, and I broadly agree that they exist, but I also think the (admittedly) exaggerated framings here are doing a lot of work (basically just intuition pumping, for better or worse). The argument would be just as strong in the opposite direction if we swap the valence and optimism/pessimism of the passages: what if, in scenario one, the AI safety community continues making incremental progress on specific topics in interpretability and scalable oversight but achieves too little too slowly and fails to avert the risk of unforeseen emergent capabilities in large models driven by race dynamics, or even worse, accelerates those dynamics by drawing more talent to capabilities work? Whereas in scenario two, what if the AI safety movement becomes similar to the environmental movement by using public advocacy to build coalitions among diverse interest groups, becoming a major focus of national legislation and international cooperation, moving hundreds of billions of $ into clean tech research, etc.

Don't get me wrong — there's a place for intuition pumps like this, and I use them often. But I also think that both technical and advocacy approaches could be productive or counterproductive, and so it's best for us to cautiously approach both and evaluate the risks and merits of specific proposals on their own. In terms of the things you mention driving bad outcomes for advocacy, I'm not sure if I agree — feeling uncertain about paying for ChatGPT seems like a natural response for someone worried about OpenAI's use of capital, and I haven't seen evidence that Holly (in the post you link) is exaggerating any risks to whip up support. We could disagree about these things, but my main point is that actually getting into the details of those disagreements is probably more useful in service of avoiding the second scenario than just describing it in pessimistic terms.

It's not obvious to me that message precision is more important for public activism than in other contexts. I think it might be less important, in fact. Here's why:

My guess is that the distinction between "X company's frontier AI models are unsafe" vs. "X company's policy on frontier models is unsafe" isn't actually registered by the vast majority of the public (many such cases!). Instead, both messages basically amount to a mental model that is something like "X company's AI work = bad" And that's really all the nuance that you need to create public pressure for X company to do something. Then, in more strategic contexts like legislative work and corporate outreach, message precision becomes more important. (When I worked in animal advocacy, we had a lot of success campaigning for nuanced policies with protests that had much vaguer messaging).

Also, I don't think the news media is "likely" going to twist an activist's words. It's always a risk, but in general, the media seems to have a really healthy appetite for criticizing tech companies and isn't trying to work against activists here. If anything, not mentioning the dangers of the current models (which do exist) might lead to media backlash of the "X-risk is a distraction" sort. So I really don't think Holly saying "Meta’s frontier AI models are fundamentally unsafe" is evidence of a lack of careful consideration re: messaging here.

I do agree with the Open Source issue though. In that case, it seems like the message isn't just imprecise, but instead pointing in the wrong direction altogether.

Given the already existing support of the public for going slowly and deliberately, there seems to be a decent case that instead of trying to build public support, we should directly target the policymakers.

I think "public support" is ambiguous, and by some definitions, it isn't there yet.

One definition is something like "Does the public care about this when they are asked directly?" and this type of support definitely exists, per data like the YouGov poll showing majority support for AI pause.

But there are also polls showing that almost half of U.S. adults "support a ban on factory farming." I think the correct takeaway from those polls is that there's a gap between vaguely agreeing with an idea when asked vs. actually supporting specific, meaningful policies in a proactive way.

So I think the definition of "public support" that could help the safety situation, and which is missing right now, is something like "How does this issue rank when the public is asked what causes will inform their voting decisions in the next election cycle?"

Load more