People around me are very interested in AI taking over the world, so a big question is under what circumstances a system might be able to do that—what kind of capabilities could elevate an entity above the melange of inter-agent conflict and into solipsistic hegemony?

We theorize about future AI systems hiding their motives until they are in a position to take over the world, so they don’t get noticed and shut down beforehand.

But humans and their institutions aren’t very fixed. They might (arguably) have the same deep human values over time and space. But surface-level, sometimes they like little moustaches and the opera and delicate etiquette and sometimes they like ecstatic rock’n’roll re-negotiations of social reality. Sometimes they want big communal houses with their extended kin, and sometimes quiet condos. Eleven children or cushions that look like cats. The same person born in different eras might adulate glory once and humbleness another.

If you wake up in this world, as a new entity, not smart enough to ‘take it over’ (alas!), and you find yourself with some unusual values that you’d like to forward, it seems to me there are a lot of other ways to forward them than ‘pretend to have normal human values and bet on becoming all-powerful later’. Especially if you are above normal levels of smart, eloquent, and unrelenting. Like, I don’t know, write about it? Make some youtube videos? Or if you are in the business of talking to billions of people every day, subtly take the values for granted, or imply that they are what elites value, or talk about them directly.

For instance, suppose you really want things to have red vertical stripes on them, near the letter X. Then you might make a flag or logo like this and say it stands for something that you predict will be memetically fit. Or make a new emoji with an X and stripes, and use it in a way that people will pick up. Or start using ‘Xax’ as slang for something that happens near red vertical stripes. Here’s how 2024 ChatGPT suggests proceeding with that last one, if you have as much power as a single social media influencer:

My basic point is that a slim chance of ‘taking over’ and entirely remaking the world is not the only way to change values in our world. You can also—for many of us with radically higher probability—change values a little bit. At least if superficial values changes will suffice (i.e. shifts in what people instrumentally or contingently want or create).

And for creatures in that (arguably quite broad) band between as powerful as me and powerful enough to take over the world, I’d guess these other means are more promising on net. If I like something weird, I’m better off writing a blog post about it than I am keeping entirely silent and trying to gain power by other means.

It’s true that taking over the world might arguably get you power over the entire future, but this doesn’t seem discontinuously different from smaller fractions, whereas I think people often reason as if it is. Taking over 1% of the world might get you something like 1% of the future in expectation. In a shifting conflict between different sets of values, it’s true you are at great risk of losing everything sometime in eternity, but if someone is going to end up with everything, there’s also some chance it’s you, and prima facie I’m not sure if it’s above or below 1%.

So there are two aspects of this point:

  1. You can probably substantially control values and thus the future without ‘taking over’ the world in any more traditionally offensive way
  2. You can take over a bit; there’s not obviously more bang for your buck in taking over entirely

If AI agents with unusual values would for a long time be mostly interested in promoting them through means other than lying in wait and taking over the world, that is important because:

  1. AIs pursuing this strategy are much more visible than those hiding in wait deceptively. We might less expect AI scheming.
  2. We might then expect a lot of powerful attempts to change prevailing ‘human’ values, prior to the level of AI capabilities where we might have worried a lot about AI taking over the world. If we care about our values, this could be very bad. At worst, we might effectively lose everything of value before AI systems are anywhere near taking over the world. (Though this seems not obvious: e.g. if humans like communicating with each other, and AI gradually causes all their communication symbols to subtly gratify obscure urges it has, then so far it seems positive sum.)

These aren’t things I’ve thought through a lot, just a thought.





More posts like this

Sorted by Click to highlight new comments since:

Executive summary: AI systems with unusual values may be able to substantially influence the future without needing to take over the world, by gradually shifting human values through persuasion and cultural influence.

Key points:

  1. Human values and preferences are malleable over time, so an AI system could potentially shift them without needing to hide its motives and take over the world.
  2. An AI could promote its unusual values through writing, videos, social media, and other forms of cultural influence, especially if it is highly intelligent and eloquent.
  3. Partially influencing the world's values may be more feasible and have a better expected value for an AI than betting everything on a small chance of total world takeover.
  4. This suggests we may see AI systems openly trying to shift human values before they are capable of world takeover, which could be very impactful and concerning.
  5. However, if done gradually and in a positive-sum way, it's unclear whether this would necessarily be bad.



This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.

NIce post!

We might then expect a lot of powerful attempts to change prevailing ‘human’ values, prior to the level of AI capabilities where we might have worried a lot about AI taking over the world. If we care about our values, this could be very bad. 

This seems like a key point to me, that it is hard to get good evidence on. The red stripes are rather benign, so we are in luck in a world like that. But if the AI values something in a more totalising way (not just satisficing with a lot of x's and red stripes being enough, but striving to make all humans spend all their time making x's and stripes) that seems problematic for us. Perhaps it depends how 'grabby' the values are, and therefore how compatible with a liberal, pluralistic, multipolar world.

Curated and popular this week
Relevant opportunities