Hide table of contents

Here are some concerns which have been raised about the development of advanced AI:

  • Power might become concentrated with agentic AGIs which are highly misaligned with humanity as a whole (the second species argument).
  • AI might allow power to become concentrated to an unprecedented extent with elites who are misaligned with humanity as a whole.
  • AI might make catastrophic conflicts easier or more likely; in other words, the world might become more vulnerable with respect to available technology.
  • AIs might be morally relevant, but be treated badly.

(EDIT: I've now removed a section on Paul Christiano's “slow-rolling catastrophe” argument, since he says he didn't intend it to be about narrow, non-agentic AIs. It can still be found, along with an extensive discussion between us on the topic, on the Alignment Forum version of this post.)

I’ve already done a deep dive on the second species argument, so in this post I’m going to focus on the others - the risks which don’t depend on thinking of AIs as autonomous agents with general capabilities. Warning: this is all very speculative; I’m mainly just trying to get a feeling for the intellectual terrain, since I haven’t seen many explorations of these concerns so far.

Inequality and totalitarianism

One key longtermist concern about inequality is that certain groups might get (semi)permanently disenfranchised; in other words, suboptimal values might be locked in. Yet this does not seem to have happened in the past: moral progress has improved the treatment of slaves, women, non-Europeans, and animals over the last few centuries, despite those groups starting off with little power. It seems to me that most of these changes were driven by the moral concerns of existing elites, backed by public sentiment in wealthy countries, rather than improvements in the bargaining position of the oppressed groups which made it costlier to treat them badly (although see here for an opposing perspective). For example, ending the slave trade was very expensive for Britain; the Civil War was very expensive for the US; and so on. Perhaps the key exception is the example of anti-colonialist movements - but even then, public moral pressure (e.g. opposition to harming non-violent protesters) was a key factor.

What would reduce the efficacy of public moral pressure? One possibility is dramatic increases in economic inequality. Currently, one limiting factor on inequality is the fact that most people have a significant amount of human capital, which they can convert to income. However, AI automation will make most forms of human capital much less valuable, and therefore sharply increase inequality. This didn’t happen to humans after the industrial revolution, because human intellectual skills ended up being more valuable in absolute terms after a lot of physical labour was automated. But it did happen to horses, who lost basically all their equine capital.

Will any human skills remain valuable after AGI, or will we end up in a similar position to horses? I expect that human social skills will become more valuable even if they can be replicated by AIs, because people care about human interaction for its own sake. And even if inequality increases dramatically, we should expect the world to also become much richer, making almost everyone wealthier in absolute terms in the medium term. In particular, as long as the poor have comparable levels of political power as they do today, they can use that to push the rich to redistribute wealth. This will be easiest on a domestic level, but it also seems that citizens of wealthy countries are currently sufficiently altruistic to advocate for transfers of wealth to poorer countries, and will do so even more if international inequality grows.

So to a first approximation, we can probably think about concerns about inequality as a subset of concerns about preventing totalitarianism: mere economic inequality within a (somewhat democratic) rule of law seems insufficient to prevent the sort of progress that is historically standard, even if inequality between countries dramatically increases for a time. By contrast, given access to AI technology which is sufficiently advanced to confer a decisive strategic advantage, a small group of elites might be able to maintain power indefinitely. The more of the work of maintaining control is outsourced to AI, the smaller that group can be; the most extreme case would be permanent global totalitarianism under a single immortal dictator. Worryingly, if there’s no realistic chance of them being overthrown, they could get away with much worse behaviour than most dictators - North Korea is a salient example. Such scenarios seem more likely in a world where progress in AI is rapid, and leads to severe inequality. In particular, economic inequality makes subversion of our political systems easier; and inequality between countries marks it more likely for an authoritarian regime to gain control of the world.

In terms of direct approaches to preventing totalitarianism, I expect it will be most effective to apply existing approaches (e.g. laws against mass surveillance) to new applications powered by AI; but it’s likely that there will also be novel and valuable approaches. Note, finally, that these arguments assume a level of change comparable to the industrial revolution; however, eventually we’ll get far beyond that (e.g. by becoming posthuman). I discuss some of these long-term considerations later on.

A vulnerable world

This section is roughly in line with Bostrom’s discussion of the vulnerable world hypothesis, although at the end I also talk about some ways in which new technologies might lead to problematic structural shifts rather than direct vulnerabilities. Note that I discuss some of these only briefly; I’d encourage others to investigate them in greater detail.


It may be the case that human psychology is very vulnerable to manipulation by AIs. This is the type of task on which a lot of data can be captured (because there are many humans who can give detailed feedback); the task is fairly isolated (manipulating one human doesn’t depend much on the rest of the world); and the data doesn’t become obsolete as the world changes (because human psychology is fairly stable). Even assuming that narrow AIs aren’t able to out-argue humans in general, they may nevertheless be very good at emotional manipulation and subtle persuasion, especially against humans who aren’t on their guard. So we might be concerned that some people will train narrow AIs which can be used to manipulate people’s beliefs or attitudes. We can also expect that there will be a spectrum of such technologies: perhaps the most effective will be direct interaction with an AI able to choose an avatar and voice for itself. AIs might also be able to make particularly persuasive films, or ad campaigns. One approach I expect to be less powerful, but perhaps relevant early on, is an AI capable of instructing a human on how to be persuasive to another human.

How might this be harmful to the long-term human trajectory? I see two broad possibilities. The first is large-scale rollouts of weaker versions of these technologies, for example by political campaigns in order to persuade voters, which harms our ability to make good collective decisions; I’ll call this the AI propaganda problem. (This might also be used by corporations to defend themselves from the types of punishments I discussed in the previous section). The second is targeted rollouts of more powerful versions of this technology, for example aimed at specific politicians by special interest groups, which will allow the attackers to persuade or coerce the targets into taking certain actions; I’ll call this the AI mind-hacking problem. I expect that, if mind-hacking is a real problem we will face, then the most direct forms of it will quickly become illegal. But in order to enforce that, detection of it will be necessary. So tools which can distinguish an AI-generated avatar from a video stream of a real human would be useful; but I expect that they will tend to be one step behind the most sophisticated generative tools (as is currently the case for adversarial examples, and cybersecurity). Meanwhile it seems difficult to prevent AIs being trained to manipulate humans by making persuasive videos, because by then I expect AIs to be crucial in almost every step of video production.

However, this doesn’t mean that detection will be impossible. Even if there’s no way to differentiate between a video stream of a real human versus an AI avatar, in order to carry out mind-hacking the AI will need to display some kind of unusual behaviour; at that point it can be flagged and shut down. Such detection tools might also monitor the mental states of potential victims. I expect that there would also be widespread skepticism about mind-hacking at first, until convincing demonstrations help muster the will to defend against them. Eventually, if humans are really vulnerable in this way, I expect protective tools to be as ubiquitous as spam filters - although it’s not clear whether the offense-defense balance will be as favourable to defense as it is in the case of spam. Yet because elites will be the most valuable targets for the most extreme forms of mind-hacking, I expect prompt action against it.

AI propaganda, by contrast, will be less targeted and therefore likely have weaker effects on average than mind-hacking (although if it’s deployed more widely, it may be more impactful overall). I think the main effect here would be to make totalitarian takeovers more likely, because propaganda could provoke strong emotional reactions and political polarisation, and use them to justify extreme actions. It would also be much more difficult to clamp down on than direct mind-hacking; and it’d target an audience which is less informed and less likely to take protective measures than elites.

One closely-related possibility is that of AI-induced addiction. We’re already seeing narrow AI used to make various social media more addictive. However, even if it’s as addictive as heroin, plenty of people manage to avoid using that, because of the widespread knowledge of its addictiveness. Even though certain AI applications are much easier to start using than heroin, I expect similar widespread knowledge to arise, and tools (such as website blockers) to help people avoid addiction. So it seems plausible that AI-driven addiction will be a large public health problem, but not a catastrophic threat.

The last possibility along these lines I’ll discuss is AI-human interactions replacing human-human interactions - for example, if AI friends and partners become more satisfying than human friends and partners. Whether this would actually be a bad outcome is a tricky moral question; but either way, it definitely opens up more powerful attack vectors for other forms of harmful manipulation, such as the ones previously discussed.

Centralised control of important services

It may be the case that our reliance on certain services - e.g. the Internet, the electrical grid, and so on - becomes so great that their failure would cause a global catastrophe. If these services become more centralised - e.g. because it’s efficient to have a single AI system which manages them - then we might worry that a single bug or virus could wreak havoc.

I think this is a fairly predictable problem that normal mechanisms will handle, though, especially given widespread mistrust of AI, and skepticism about its robustness.

Structural risks and destructive capabilities

Zwetsloot and Dafoe have argued that AI may exacerbate (or be exacerbated by) structural problems. The possibility which seems most pressing is AI increasing the likelihood of great power conflict. As they identify, the cybersecurity dilemma is a relevant consideration; and so is the potential insecurity of second-strike capabilities. Novel weapons may also have very different offense-defense balances, or costs of construction; we currently walk a fine line between nuclear weapons being sufficiently easy to build to allow Mutually Assured Destruction, and being sufficiently hard to build to prevent further proliferation. If those weapons are many times more powerful than nuclear weapons, then preventing proliferation becomes correspondingly more important. However, I don’t have much to say right now on this topic, beyond what has already been said.

A digital world

We should expect that we will eventually build AIs which are moral patients, and which are capable of suffering. If these AIs are more economically useful than other AIs, we may end up exploiting them at industrial scales, in a way analogous to factory farming today.

This possibility relies on several confusing premises. First is the question of moral patienthood. It seems intuitive to give moral weight to any AIs that are conscious, but if anything this makes the problem thornier. How can we determine which AIs are conscious? And what does it even mean, in general, for AIs very different from current sentient organisms to experience positive or negative hedonic states? Shulman and Bostrom discuss some general issues in the ethics of digital minds, but largely skim over these most difficult questions.

It’s easier to talk about digital minds which are very similar to human minds - in particular, digital emulations of humans (aka ems). We should expect that ems differ from humans mainly in small ways at first - for example, they will likely feel more happiness and less pain - and then diverge much more later on. Hanson outlines a scenario where ems, for purposes of economic efficiency, are gradually engineered to lack many traits we consider morally valuable in our successors, and then end up dominating the world. Although I’m skeptical about the details of his scenario, it does raise the crucial point that the editability and copyability of ems undermine many of the safeguards which prevent dramatic value drift in our current civilisation.

Even aside from resource constraints, though, other concerns arise in a world containing millions or billions of ems. Because it’s easy to create and delete ems, it will be difficult to enforce human-like legal rights for them, unless the sort of hardware they can run on is closely monitored. But centralised control over hardware comes with other problems - in particular, physical control over hardware allows control over all the ems running on it. And although naturally more robust than biological humans in many ways, ems face other vulnerabilities. For example, once most humans are digital ems, computer viruses will be a much larger (and potentially existential) threat.


Based on this preliminary exploration, I’m leaning towards thinking about risks which might arise from the development of advanced narrow, non-agentic AI primarily in terms of the following four questions:

  1. What makes global totalitarianism more likely?
  2. What makes great power conflict more likely?
  3. What makes misuse of AIs more likely or more harmful?
  4. What vulnerabilities may arise for morally relevant AIs or digital emulations?





More posts like this

Sorted by Click to highlight new comments since:

I think [the risk of letting single AI systems control essential products like the internet or electrical grids] is a fairly predictable problem that normal mechanisms will handle, though, especially given widespread mistrust of AI, and skepticism about its robustness.

I was wondering if this neglects the risks of some agents unilaterally using AI systems to control those services, e.g. we might worry about narrow AI finding ways to manipulate stock markets, which (speaking as someone with 0 knowledge) naively doesn‘t seem easily fixed with existing mechanisms. E.g. the flash crash from 2010 seems like evidence for the fragility

New regulations put in place following the 2010 flash crash[10] proved to be inadequate to protect investors in the August 24, 2015, flash crash — "when the price of many ETFs appeared to come unhinged from their underlying value"[10] — and ETFs were subsequently put under greater scrutiny by regulators and investors.[10] https://en.wikipedia.org/wiki/2010_flash_crash#Overview

One possibility that maybe you didn't close off (unless I missed it) is "death by feature creep" (more likely "decline by feature creep").  It's somewhat related to the slow-rolling catastrophe, but with the assumption that AI (or systems of agents including AI,  also involving humans) might be trying to optimize for stability and thus regulate each other, as well as trying to maximize some growth variable (innovation, profit).

 Our inter-agent (social, regulatory, economic, political) systems were built by the application of human intelligence, to the point that human intelligence can't comprehend the whole, making it hard to solve systemic problems.  So in one possible scenario, humans plus narrow AI might simplify the system at first, but then keep adding features to the system of civilization until it is unwieldy again.  (Maybe a superintelligent AGI could figure it out?  But if it started adding its own features, then maybe not even it understand what had evolved.)  Complexity can come from competitive pressures, but also from technological innovations.  Each innovation stresses the system, until the system can assimilate it more or less safely, by means of new regulation (social media that messes up politics unless / until we can break or manage some of its power).  

Then, if some kind of feedback loop leading toward civilizational decline begins, general intelligences (humans, if humans are the only general intelligences) might be even less capable of figuring out how to reverse course than they currently are.  In a way, this could be narrow AI as just another important technology, marginally complicating the world.  But also,  we might use narrow AI as tools in AI/AI+humans governance (or perhaps in understanding innovation), and they might be capable of understanding things that we cannot (often things that AI themselves made up), creating a dependency that could contribute in a unique way to a decline.  

(Maybe "understand" is the wrong word to apply to narrow AI but "process in a way sufficiently opaque to humans" works and is as bad.)

Curated and popular this week
Relevant opportunities