Talking publicly about AI risk

Jan_Kulveit

In the past year, I have started talking about AI risk publicly - in mainstream newspapers, public radio, national TV, some of the most popular podcasts. The twist, and reason why you probably haven't noticed is I'm doing this in Czech. This has a large disadvantage - the positive impact is quite limited, compared to English. On the other hand, it also had a big advantage - the risk is very low, because it is very hard for memes and misunderstandings to escape the language bubble. Overall I think this is great for experiments with open public communication .

Following is an off-the-cuff list of notes and suggestions. In my view the debate in Czech media and Czech social networks is on average marginally more sensible and more informed than in English so far, so perhaps part of this was successful and could be useful for others.

Context: my views

For context, it's probably good to briefly mention some of my overall views on AI risk, because they are notably different from some other views.

I do expect

1. Continuous takeoff, and overall a large amount of continuity of agency (note that continuous does not imply things move slowly)

2. Optimization and cognition distributed across many systems to be more powerful than any single system, making a takeover by a single system possible but unlikely

3. Multiagent interactions to matter

4. I also do expect the interactions between the memetics and governance and the so-called "technical problem" to be strong and important

As a result I also expect

5. There will be warning shots

6. There will be cyborg periods

7. World will get weird

8. Coordination mechanisms do matter

This perspective may be easier to communicate than e.g. sudden foom - although I don't know.

In the following I'll usually describe my approach, and illustrate it by actual quotes from published media interviews I'm sort of happy about (translated, unedited). Note that the specific ways how to say something or metaphors are rarely original.

Aim to explain, not to persuade

Overall I usually try to explain stuff and answer questions, rather than advocate for something. I'm optimistic about the ability of the relevant part of the public to actually understand a large part of the risk at a coarse-grained level, given enough attention.

So even though we invented these machines ourselves, we don't understand them well enough?
We know what's going on there at the micro level. We know how the systems learn. If one number in a series changes, we know how the next one changes. But there are tens of billions of such numbers. In the same way, we have some idea of how a neuron works, and we have maps of a network of thousands of neurons. But that doesn't tell us that much about how human thinking works at the level of ideas.

Small versions of scaled problems

Often, I think the most useful thing to convey is a scaled-down, easier version of the scaled problem, such that thinking about the smaller version leads to correct intuitions about the scaled problem, or solutions to the scaled-down problem may generalise to the later, scaled problem.

This often requires some thought or finding a good metaphor.

Couldn't we just shut down such a system?
We already have a lot of systems that we could hypothetically shut down, but if you actually tried to do so, it would be very difficult. For example, it is practically impossible to shut down the New York Stock Exchange, because there will always be enough people defending it. If the model manages to penetrate deeply enough into humanity's activities, it is imaginable that humanity will actually lose control of it at some point.

Don't focus on one scenario

Overall, I think it's possible to explain the fact that in face of AI risk there isn't one particular story we can identify and prevent, but the problem is that unaligned powerful systems can find many ways to threaten you.

How can we imagine the threat associated with the development of artificial intelligence? I expect not as shooting robots.
Not shooting robots, of course. I'd put it another way. One of the reasons humans have become dominant on the planet is their intelligence. Not only that, but it is also our ability to work together in large groups and share ideas.

However, if you look at the evolution of mankind from the perspective of a chimpanzee or a mammoth, at one point something started happening for them that they no longer understood. Today, people can tame the chimpanzee or kill it in a staggering number of ways that they no longer understand. The chimp doesn't understand what happens when the tranquilliser injection hits.

And we could be in a similar position the moment we lose control of artificial intelligence.

…

I'm rather reluctant to describe one particular scenario. If a system is much more intelligent than I am, it can naturally develop ways to limit or threaten me that I can't even imagine.

Some of the people who study the risks of AI think that someone is going to plug in a large system, it will somehow escape, and we will lose control of it very quickly. It could even happen that we're all dead at once and we don't know why.

Personally, I think it's more likely that there will be some kind of continuous loss of control. Nothing dramatic will happen, but more likely we'll get to a state where we don't understand what's going on with the world, and we won't be able to influence it. Or we'll get a sense of some unrealistic picture of the world in which we're happy and we won't complain, but we won't decide anything.

But if I describe a very specific scenario to you, you would argue that something like that can be prevented in advance or easily solved. But that's the problem I was describing a moment ago. We'd just be talking like two chimpanzees who also can't even imagine most of the ways humans can threaten them.

The doom memeplex

Overall, I'm not excited about summoning the doom memeplex into public consciousness. (In a large contrast to Eliezer Yudkowsky, who seems to be investing heavily into this summon). Why?

Mostly, I don't trust that the doom memeplex is all that helpful in solving the alignment problem. I broadly agree with Valentine that being in a state of mental pain and terror isn't a great state to see the problem clearly enough.

Also similarly to AIs, memeplexes are much easier to summon than control.

Also similarly to some AIs, memeplexes have convergent instrumental goals and self-interests. Some worrisome implications are, among others:

- it's not in the self-interest of the doom memeplex to recognize alignment solutions

- it is in the self-interest of he doom memeplex to reward high p(doom) beliefs

- the AGI doom memeplex has, to some extent, a symbiotic relationship with the race toward AGI memeplex

One specific implication is I find it much more productive and useful to focus on the fact that in AI risk scenarios we lose control over the future, rather than "your kids will die".

A colleague of mine describes it with the metaphor that we can find ourselves in the role of a herd of cows, whose fate is being determined by the farmer. And I don't think we want to get to that stage. While there are many interesting questions about what might happen to such a herd afterwards, I think they are distracting. We need to focus on not losing control of the AI

Relevant maps

Large part of the difference between how the general public understands AI and ML specialists understand AI is in what maps people rely on. The public often uses a map "like a human, but running on a computer" and has easy access to maps like "like a Google size corporation, but automated". In contrast, many ML practitioners rely on maps like "my everyday experience with training ML models, which are small and sort of dumb".

Good understanding of AI risk usually requires thinking about multiple maps, but basic understanding of the risk can be actually based on the maps accessible to the public.

The metaphor of maps is also useful in explaining why ML expertise is not enough, how it is possible that AI experts disagree, and how a layperson can orient toward who to trust.

For example, Yann LeCun, vice president of Meta and head of AI research at Facebook. Facebook has one of the worst reputations among the big players in terms of approach to safety. LeCun is a great expert in machine learning, but he also claims that the problem is much smaller than we think because, after all, as humanity we can tame various intelligent systems, such as raising children or regulating corporations. When the Vice President of Meta says this, it fills me with more dread than optimism. If I were to take the child-rearing metaphor seriously, it's like believing we can raise alien children.
…
For me, VP of Meta drawing his confidence from humanity's ability to align corporations is a reason to worry, not source of confidence.

The aim here is to explain that when e.g. Yann LeCun makes confident claims about AI risk, these are not mostly based on his understanding of machine learning, but often on analogies with different systems the reader knows and is able to evaluate independently.

The overall experience

I'm quite picky in who to talk to, refusing the majority of interview requests, but conditional on that, my experience so far was generally positive. In particular after the interest in AI exploded after the release of ChatGPT and GPT4, technical journalists became reasonably informed. There is no need to justify the plausibility of powerful AI systems anymore. Also the idea of AI risk is firmly in the Overton window, and privately, a decent fraction of the people I talked with admitted being concerned themselves.

Some of the resulting artefacts in machine translation:

What to do in English

I'd be pretty excited if the AI alignment community was able to generate more people able to represent various views of the field publicly, in a manner which makes both the public and policymakers more informed. I think this is particularly good fit for people with some publicly legible affiliations, such as in academic roles or tech companies not participating in the race directly.

Arden KoehlerApr 25 20239

thanks for this post! I'm curious - can you explain this more?

the AGI doom memeplex has, to some extent, a symbiotic relationship with the race toward AGI memeplex

titotalApr 26 202310

My interpretation would be that they both tend to buy into the same premises that AGI will occur soon and that it will be godlike in power. Depending on how hard you believe alignment is, this would lead you to believe that we should build AGI as fast as possible (so that someone else doesn't build it first), or that we should shut it all down entirely.

By spreading and arguing for their shared premises, both the doomers and the AGI racers get boosted by the publicity given to the other, leading to growth for them both.

As someone who does not accept these premises, this is somewhat frustrating to watch.

[anonymous]Apr 26 20234

Maybe something like this: https://www.lesswrong.com/posts/KYzHzqtfnTKmJXNXg/the-toxoplasma-of-agi-doom-and-capabilities

LinchApr 26 20233

Thanks, I was thinking about linking the same thing.

David JohnstonApr 26 20233

AFAIK the official MIRI solution to AI risk is to win the race to AGI but do it aligned.

Part of the MIRI theory is that winning the AGI race will give you the power to stop anyone else from building AGI. If you believe that, then it’s easy to believe that there is a race, and that you sure don’t want to lose.

Jan_KulveitMay 22 20232

Sorry for the delay in response.

Here I look at it from a purely memetic perspective - you can imagine thinking as a self-interested memplex. Note I'm not claiming this is the main useful perspective, or this should be the main perspective to take.

Basically, from this perspective

* the more people think about AI race, the easier is to imagine AI doom. Also the specific artifacts produced by AI race make people more worried - ChatGPT and GPT-4 likely did more for normalizing and spreading worried about AI doom than all the previous AI safety outreach together.

The more the AI race is clear reality people agree on, the more attentional power and brainpower you will get.

* but also from the opposite direction... : one of the central claim of the doom memplex is AI systems will be incredibly powerful in our lifetimes - powerful enough to commit omnicide, take over the world, etc. - and their construction is highly convergent. If you buy into this, and you are certain type of person, you are pulled toward "being in this game". Subjectively, it's much better if you - the risk-aware, pro-humanity player - are at the front. Safety concerns of Elon Musk leading to founding of OpenAI likely did more to advance AGI than all advocacy of Kurzweil-type accelerationist until that point...

Empirically, the more people buy into the "single powerful AI systems are incredibly dangerous", the more attention goes toward work on such system.

Both memeplexes share a decent amount of maps, which tend to work as blueprints or self-fullfilling prophecies for what to aim for.

Oliver SourbutApr 28 20234

Thank you for sharing this! Especially the points about relevant maps and Meta/FAIR/LeCun.

I was recently approached by the UK FCDO as a technical expert in AI with perspective on x-risk. We had what I think were very productive conversations, with an interesting convergence of my framings and the ones you've shared here - that's encouraging! If I find time I'm hoping to write up some of my insights soon.

Oliver SourbutMay 15 20231

I wrote a little here about unpluggability (and crossposted on LessWrong/AF)

CB🔸Apr 25 20232

Thanks - advice on "how to message complex things" is really useful - I'm always surprised by how neglected this is.

By the way, if at some point you were to redirect people toward a link explaining the problem with AI (article, website, video), as a resource they can use to understand the problem, what would you provide? I'm looking for a link in English - so far it's not clear what to point to.

For instance, the FLI tribune makes a clear case that many knowledgeable people care about this, but it's not very good at explaining what the risk really is.

utilistrutilMay 9 20231

I would endorse all of this based on experience leading EA fellowships for college students! These are good principles not just for public media discussions, but also for talking to peers.

PatoMay 4 20231

I doubt this:

I mean, if you say it could increase the amount of people working in capabilities at first I would agree, but it probably increases a lot more the amount of people working on safety and wanting to slow/ stop capabilities research, which could create legislation and at the end of the day increase time until AGI.

In respect of the other cons of the doom memeplex I kinda agree to a certain extent but I don't see them come even close to the pros of potentially having lots of people actually taking the problem very seriously.

Effective Altruism Forum
EA Forum