It is important to figure out the best way(s) to convince people that AI safety is worth taking seriously because despite the fact that it is (in my opinion, and in the opinion of many people in the EA community) the most important cause area, it often seems weird to people at first glance.  I think that one way to improve the persuasiveness of AI safety pitches would be to use the frame that AI safety  is a problem because the profit-based incentives of private sector AI developers do not account for the externalities generated by risky AGI projects.

Many groups that EA pitches to are relatively left leaning.  In particular, elite university students are much more left leaning than the general population.  As such, they are likely to be receptive to arguments for taking AI safety seriously which highlight the fact that AI safety is a problem largely due to deeper problems with capitalism.  One such problem is the fact that capitalism fails to take into account externalities, or effects of economic activity which are not reflected in that activity's price.[1]  Developing AGI generates huge negative externalities; while a private sector actor who creates aligned AGI would probably reap much of the economic gains from it (at least in the short term - it is unclear how these gains would be distributed over longer time scales), it would pay only a small fraction of the costs of unaligned AGI, which are almost entirely borne by the rest of the world and future generations.   Thus, misalignment risks from AGI are significantly heightened by the structural failure of capitalism to account for externalities, a problem which left leaning people tend to be very mindful of.  Even beyond left-leaning students, it is widely acknowledged by educated people with an understanding of economics that a major problem with capitalism is that it fails by default to deal with externalities.  Similarly, many people in the general public view corporations and big tech as irresponsible, greedy actors who harm the public good even if they lack an understanding of the logic of externalities.  Thus, in addition to being particularly persuasive to left-leaning people who understand externalities, this framing seems likely to also be persuasive to people with a wider range of political orientations and levels of understanding of economics.

While this argument does not imply that misaligned AGI constitutes an existential risk, when it is combined with the claim that AI systems will have large impacts of some kind on the future (which many who are skeptical of AI x-risk still believe), it implies that we will by default significantly underinvest in ensuring that the AI systems which will shape the future will have positive effects on society.  This conclusion seems likely to make people broadly more concerned about the negative effects of AI.  Moreover, even if they do not conclude that AI development could pose an existential risk, the argument still implies that AI safety research constitutes a public good which should receive much more funding and attention than it currently does.  Given that it seems to me like alignment research focused on preventing existential catastrophe seems highly related to broader efforts to ensure future AI systems have positive effects on the world, having more people believe the previous claim seems quite good.

As a result, it seems like "AI safety is a public good which will be underinvested in by default" or (more polemically) "AI developers are gambling with the fate of humanity for the sake of profit, and we need to stop them/ensure that their efforts don't have catastrophic effects" should be a more common frame used to pitch the importance of AI safety.  It is an accurate and rhetorically effective framing of the problem.  Am I missing something?

  1. ^

    For a longer explanation of externalities, see


New Comment
13 comments, sorted by Click to highlight new comments since: Today at 6:59 AM

It's risky to connect AI safety to one side of an ideological conflict.

There are ways to frame AI safety as (partly) an externality problem without getting mired in a broader ideological conflict.

I think you can stress the "ideological" implications of externalities to lefty audiences while having a more neutral tone with more centrist or conservative audiences.  The idea that externalities exist and require intervention is not IMO super ideologically charged.

[-][anonymous]4mo 13

IMO this isn't a 100% accurate description of what the claims for AI risk being hard are. (And I would generally be against using inaccurate claims to attract people to legitimate areas)


For starters, arms race is probably a better terminology than externalities. When I think "externalities" I think - this technology creates X amount of private good and Y amount of public harm, and selfish actors have incentives create them. And while this description can be forcibly fit for AI x-risk, it is important to remember that x-risk is a harm from a selfish perspective too. Even if you don't care about the survival of other people, you care about your own survival, and therefore AI that causes your death or else permanently disempowers you is a harm to you too. An actor who honestly believes their AI has high probability of causing x-risk will not unilaterally deploy it anyway. Unlike, say, a fossil fuel company that will produce fossil fuel fully knowing the CO2 footprint they are creating, or a cigarette manufacturer that will produce cigarettes while having accurate statistics of the expected number of future cancer patients they are helping create.

In an arms race however, it can make sense for someone to pursue technologies that have significant x-risk, if there is also (in this case, utopian) upside they wish to capture before an opponent does.

Externalities is also not a good framing because it assumes that state regulation is sufficient to solve the problem, and prevent private actors from consuming public goods. It assumes the problem to be easy assuming govt support, and the hard part is primarily one of public protest and lobbying the government. However it is currently uncertain how much of a role state regulation will play in AI risk. It forces the problem into a conflict-theoretic frame rather an epistemic disagreement. Both kinds of problems require very different solutions.


However, arms race is also not the best terminology. For instance, Hayden Belfied has a post that cautions against prematurely claiming that US and China or OpenAI and Deepmind or whoever are in active arms race towards AGI. Link to post. Most progress iscurrently being pursued by unilateral actors who are (somewhat) open to reason and willing to signal in favour of motivations besides profit or military superiority. OpenAI for instance has explicitly ensured a non-profit stucture to govern them, for this reason.

Arms race is also not the best terminology because many AI x-risk researchers believe that AI x-risk is high even if you take the arms race out of the equation. Even if humanity agreed on one coordinated effort to deploy AGI, and did it slowly over a period of say a few decades, there are researchers who claim AI x-risk will be hard to mitigate, and that AI systems will be default be very hard to control. Yudkowksy (and a few others) go as far as saying we will only get one real shot at the problem in practice, in which we will risk extinction.


The core claims of AI x-risk are legitimately weird and I don't think it is easy to make it non-weird. Given this to be the case, I would caution against diluting the claims and prefer debating them head-on instead.

  • AI capabilities people don't psychologically feel like AI is a threat to their selfish interests (assuming they even understand why it is a threat), because humans value short-term gain more than long-term danger (time discounting). Therefore selfish actors have incentives to work on capabilities.
  • Great point that externalities might mislead people into thinking "ah yes, another instance where we need government regulation to stop greedy companies; government regulation will solve the problem." (Although government intervention for slowing down capabilities and amplifying safety research would indeed be quite helpful.)
  • Not sure how externalities "dilutes" the claims. It's a serious problem that there are huge economic incentives for capabilities, and minuscule economic incentives for safety.
  • I don't think it's very hard to make AI x-risk sound non-weird: (1) intelligence is a meaningful concept; (2) it might be possible to build AI systems with more intelligence than humans; (3) it might be possible to build such AIs within this century; (4) if built, these AIs would have a massive impact on society; (5) this impact might be extremely negative. These core ideas are reasonable-sounding propositions that someone with good social skills could bring up in a conversation.
[-][anonymous]4mo 2

Re 1: agreed

Re 3: I'm more like "yeah externalities is part of the problem buts it's not the only problem and may not even be the main problem (assuming there is a "main" problem). Hence saying it's only externalities dilutes the claim.

Re 4: Yeah to some extent I agree. Although there's a lot of tendency to anthromorphosize not enough or too much when it comes to AI risk, which is a source of potentially irreducible weirdness.

For instance we by default tend to view superhuman programs that model human social dynamics as qualitatively different from programs that don't (and just do say, superhuman weather prediction instead). With the former it is easier to switch on the part of our brain that is designed to empathise with humans (and animals etc) rather than the part of our brain that does math and physics and computer science. And then we starting relating to the hypothetical AI as a human-like entity.

It is true that private developers internalize some of the costs of AI risk.  However, this is also true in the case of carbon emissions; if a company emits CO2, its shareholders do pay some costs in terms of having a more polluted atmosphere.  The problem is that the private developer only pays a very small fraction of the total costs which, while still quite large in absolute terms, js plausibly worth paying for the upside.  For example, if I were entirely selfish and I thought AI risk was somewhat less likely than I actually do (let's say 10%), I would probably be willing to risk a 10% chance of death for a 90% chance of massive resource acquisition and control over the future.  However, if I internalized the full costs of that 10% chance (everyone else dying and all future generations being wiped out), then I would not be willing to take that gamble.

[-][anonymous]4mo 1

This is a fair argument.

Although on net I'm not sure it outweighs all the other points I mentioned :)

I'm very pro framing this as an externality. Doesn't just help with left-leaning people, it can also be  helpful for talking to other audiences, such as those immersed in economics or antitrust/competition law.

[-][anonymous]4mo 1

I would love to hear your feedback on my comment, which is against framing AI risk as primarily an externalities problem.

I like this framing a lot. My 60 second pitch for AI safety often includes something like this. “It’s all about making sure AI benefits humanity. We think AI could develop really quickly and shape our society, and the big corporations building it are thinking more about profits than about safety. We want to do the research they should be doing to make sure this technology helps everyone. It’s like working on online privacy in the 1990s and 2000s: Companies aren’t going to have the incentive to care, so you could make a lot of progress on a neglected problem by bringing early attention to the issue.”

Without thinking too deeply, I believe that this framing, i.e. one in line with AI developers are gambling with the fate of humanity for the sake of profit, and we need to stop them/ensure that their efforts don't have catastrophic effects, for AI risk could serve as a conversational cushion for those who are unfamiliar with the general state of AI progress and with the existential risk poorly aligned AI poses. 

Those unfamiliar with AI might disregard the extent of risk from AI if approached in conversation with remarks about how not only it is non-trivial that humanity might be extinguished by AI, but many researchers believe this event is highly likely to occur,  even in the next 25 years. I imagine such scenarios are, for them, generally unbelievable. 

The cushioning could, however, lead to people trying to think about AI risk independently or to them searching for more evidence and commentary online, which might subsequently lead to them to the conclusion that AI does in fact pose a significant existential risk to humanity. 

When trying to introduce the idea of AI risk to someone who is unfamiliar with it, it's probably a good idea to give an example of a current issue with AI, and then have them extrapolate. The example of poorly designed AI systems being used by corporations for click-through, as covered in the introduction of Human Compatible, seems good to use in your framing of AI safety as a public good. Most people are familiar with the ills of algorithms designed for social media, so it is not a great step to imagine researchers designing more powerful AI systems that are deleterious to humanity via a similar design issue but at a much more lethal level: 

They aren't particularly intelligent, but they are in a position to affect the entire world because they directly influence billions of people. Typically, such algorithms are designed to maximize click-through, that is, the probability that the user clicks on presented items. The solution is simply to present items that the user likes to click on, right? Wrong. The solution is to change the user's preferences so that they become more predictable. A more predictable user can be fed items that they are likely to click on, thereby generating more revenue. People with more extreme political views tend to be more predictable in which items they click on. 

Ultimately, you should probably tailor messages to your audience, given their understanding, objections/beliefs, values, etc. If you think they understand the phrase “externalities,” I agree, but a sizable number of people in the world do not properly understand the concept.

Overall, I agree that this is probably a good thing to emphasize, but FWIW I think a lot pitches I’ve heard/read do emphasize this insofar as it makes sense to do so, albeit not always with the specific term “externality.”