Spreading messages to help with the most important century

Holden Karnofsky

Killer Apps and Technology Roulette are interesting pieces trying to sell policymakers on the idea that “superiority is not synonymous with security.” ↩
When I imagine what the world would look like without any of the efforts to “raise awareness,” I picture a world with close to zero awareness of - or community around - major risks from transformative AI. While this world might also have more time left before dangerous AI is developed, on balance this seems worse. A future piece will elaborate on the many ways I think a decent-sized community can help reduce risks. ↩
I do think “AI could be a huge deal, and soon” is a very important point that somewhat serves as a prerequisite for understanding this topic and doing helpful work on it, and I wanted to make this idea more understandable and credible to a number of people - as well as to create more opportunities to get critical feedback and learn what I was getting wrong.
But I was nervous about the issues noted in this section. With that in mind, I did the following things:
- The title, “most important century,” emphasizes a time frame that I expect to be less exciting/motivating for the sorts of people I’m most worried about (compared to the sorts of people I most wanted to draw in).
- I tried to persistently and centrally raise concerns about misaligned AI (raising it in two pieces, including one (guest piece) devoted to it, before I started discussing how soon transformative AI might be developed), and extensively discussed the problems of overemphasizing “competition” relative to “caution.”
- I ended the series with a piece arguing against being too “action-oriented.”
- I stuck to “passive” rather than “active” promotion of the series, e.g., I accepted podcast invitations but didn’t seek them out. I figured that people with proactive interest would be more likely to give in-depth, attentive treatments rather than low-resolution, oversimplified ones.
I don’t claim to be sure I got all the tradeoffs right. ↩
There are some papers arguing that AI systems do things something like this (e.g., see the “Challenges” section of this post), but I think the dynamic is overall pretty far from what I’m most worried about. ↩
E.g., public benefit corporation ↩

Henry Howard🔸

I don't like this post and I don't think it should pinned to the forum front page.

A few reasons:

The general message of: "go and spread this message, this is the way to do it" is too self-assured, and unquestioning. It appears cultish. It's off-putting to have this as the first thing that forum visitors will see.
The thesis of the post is that a useful thing for everyone to do is to spread a message about AI safety, but it's not clear what messages you think should be being spread. The only two I could see are "relate it to Skynet" and "even if AI looks safe it might not be".
Too many prerequisites: this post refers to five or ten others posts as a "this concept is properly explained here" thing. Many of these posts reference further posts. This is a red flag to me of poor writing and/or poor ideas. Either a) your ideas are so complex that they do indeed require many thousands of words to explain (in which case, fine), or b) they're not that complex, just aren't being communicated well or c) bad ideas are being obscured in a tower of readings that gatekeep the critics away. I'd like to see the actual ideas you're referring to expressed clearly, instead of referring to other posts.
Having this pinned to the front page further reinforces the disproportionate focus that AI Safety gets on the forum

NunoSempere

Personally an argument I would find more compelling is to note that the OP doesn't answer comments, making the value of discussion lower and it less interesting for a public forum. Also there is already a newsletter for cold takes that people can subscribe to.

Holden Karnofsky

Noting that I’m now going back through posts responding to comments, after putting off doing so for months - I generally find it easier to do this in bulk to avoid being distracted from my core priorities, though this time I think I put it off longer than I should’ve.

It is generally true that my participation in comments is extremely sporadic/sparse, and folks should factor that into curation decisions.

sqgroves

These don't seem very compelling to me.

This argument proves too much. The same could be said of "go and donate your money, this (list of charities we think are most effective) is the way to do it".
My takeaway was that messages which could be spread include: "we should worry about conflict between misaligned AI and all humans", "AIs could behave deceptively, so evidence of safety might be misleading, "AI projects should establish and demonstrate safety (and potentially comply with safety standards) before deploying powerful systems", "alignment research is prosocial and great" and "we’re not ready for this". (I excluded "it might be important for companies and other institutions to act in unusual ways", because I agree this doesn't seem like a straightforward message to spread).
The answer is probably (a).
"Disproportionate" seems like it boils down to an object-level disagreement about relative cause prioritisation between AI safety and other causes.

freedomandutility

I like the framing "bad ideas are being obscured in a tower of readings that gatekeep the critics away" and I think EA is guilty of this sometimes in other areas too.

Holden Karnofsky

Just noting that many of the “this concept is properly explained elsewhere” links are also accompanied by expandable boxes that you can click to expand for the gist. I do think that understanding where I’m coming from in this piece requires a bunch of background, but I’ve tried to make it as easy on readers as I could, e.g. explaining each concept in brief and providing a link if the brief explanation isn’t clear enough or doesn’t address particular objections.

Lauren Maria

I agree. I’m curious what the process is for deciding what gets pinned to the front page. Does anyone know?

Lizka

Hi! The process for curation is outlined here. In short, some people can suggest curation, and I currently make the final calls.

You can also see a list of other posts that have been curated (you can get to the list by clicking on the star next to a curated post's title).

Lauren Maria

-1

Oh, I see! Thanks, that's helpful.

Lizka

Thanks for writing this! I'm curating it.

Some things I really appreciate about the post:

The claim (paraphrased), "it is pretty easy to get AI safety messaging wrong, but there are some useful things to communicate about AI safety" seems important (and right — I've also seen examples of people accidentally spreading the idea that "AI will be powerful"). I also think lots of people in the EA community should hear it — a good number of people are in fact working on "spreading the ideas of AI safety" (see a related topic page).
It's very nice to have more content on things that ~everyone can help with.
1. "practically everyone can help with spreading messages at least some, via things like talking to friends; writing explanations of your own that will appeal to particular people; and, yes, posting to Facebook and Twitter and all of that. [...] I’d guess it can be a big deal: many extremely important AI-related ideas are understood by vanishingly small numbers of people, and a bit more awareness could snowball. Especially because these topics often feel too “weird” for people to feel comfortable talking about them! Engaging in credible, reasonable ways could contribute to an overall background sense that it’s OK to take these ideas seriously."
The lists of kinds of messages that are risky/helpful are helpful:
1. Risky (presumably not an exhaustive list!):
  1. messages that generically emphasize the importance and potential imminence of powerful AI systems
  2. messages that emphasize that AI could be risky/dangerous to the world, without much effort to fill in how, or with an emphasis on easy-to-understand risks (where one of the risks is, "If people have a bad model of how and why AI could be risky/dangerous (missing key risks and difficulties), they might be too quick to later say things like “Oh, turns out this danger is less bad than I thought, let’s go full speed ahead!”")
2. Helpful + right (This list is presumably also not exhaustive. I should also say that I'm least optimistic about iii (sort of) and v.)
  1. [S] We should worry about conflict between misaligned AI and all humans
  2. [S] AIs could behave deceptively, so “evidence of safety” might be misleading
  3. [S] AI projects should establish and demonstrate safety (and potentially comply with safety standards) before deploying powerful systems
  4. [S] Alignment research is prosocial and great
  5. [S] It might be important for companies (and other institutions) to act in unusual ways
  6. [S] We’re not ready for this

One question/disagreement/clarification I have about the statement, "I’m not excited about blasting around hyper-simplified messages."

The word "simplified" is a bit vague; I think I disagree with some interpretations of the sentence. I agree that "it’s generally not good enough to spread the most broad/relatable/easy-to-agree-to version of each key idea," but I think in some cases, "simplifying" could be really useful for spreading more accurate messages. In particular, "simplifying" could mean something like "dumbing down somewhat indiscriminately" — which is bad/risky — or it could mean something like "shortening and focusing on the key points, making technical points accessible to a more general audience, etc." — something like distillation. The latter approach seems really useful here, in part because it might help overcome a big problem in AI safety messaging: that a lot of the key points about risk are difficult to understand, and that important texts are technical. This means that it's easy to be shown cool demos of new AI systems, but not as easy to understand the arguments that explain why progress in AI might be dangerous. (So people trying to make the case in favor of safety might resort to deferring to experts, get the messages wrong in ways that make the listener unnecessarily skeptical of the overall case, etc.)
(More minor: I also think that the word "blast" has negative connotations which make it harder to correctly engage with the sentence. I think you mean "I'm not excited about sharing hyper-simplified messages in a way that reaches a ~random-but-large subset of people." I think I agree — it seems better to target a particular audience — but the way it's currently stated makes it harder to disagree; it's harder to say, "no, I think we should in fact blast some messages" than it is to say, "I think there are some messages that appeal to a very wide range of audiences," or to say "I think there are some messages we should promote extensively.")

(I should say that the opinions I'm sharing here are mine, not CEA's. I also think a lot of my opinions here are not very resilient.)

Marc Wong

Whether it’s a knife, a car, social media, or artificial intelligence, technology is power.

There’s no reason why we shouldn’t use the familiar and mature car safety culture and practices to improve AI (and other technologies’) safety.

This means user training (driver licenses), built-in safety features (eg. seat belts, air bags), frequent public service announcements, independent and rigorous safety and reliability reviews, rules and regulations (traffic rules), enforcement (traffic police), insurance, development and testing in controlled environments, guards against deliberate or accidental misuse, guards against (large) advances with (large) uncertainties, and promoting safe attitudes and mutual accountability (eg. reject road rage).

If we can’t educate the public, media, technologists, and politicians in simple, engaging terms, and inspire them to take action, then we’ll always be at risk.

Technology is Power: Raising Awareness Of Technological Risks

Arden Koehler

Nice post. One thought on this - you wrote:

"I’d be especially excited for people to spread messages that help others understand - at a mechanistic level - how and why AI systems could end up with dangerous goals of their own, deceptive behavior, etc. I worry that by default, the concern sounds like lazy anthropomorphism (thinking of AIs just like humans)."

I agree that this seems good for avoiding the anthropomorphism (in perception and in one's own thought!) but I think it'll be important to emphasise when doing this that these are conceivable ways and ultimately possible examples rather than the whole risk-case. Why? People might otherwise think that they have solved the problem when they've ruled out or fixed that particular problematic mechanism, when really they haven't. Or when the more specific mechanistic descriptions probably end up wrong in some way, the whole case might be dismissed - when the argument for risk didn't ultimately depend on those particulars.

(this only applies if you are pretty unconfident confident in the particular mechanisms that will be risky vs. safe)

[written in my personal capacity]

Holden Karnofsky

Agreed!

Peter Slattery 🔸

Thanks for sharing these suggestions, they are very helpful.

A quick comment:

I am also excited by the idea of spreading these messages and doing it well. I suspect that outside of my EA contacts most people in my network have never received a single message that made them aware of the more serious risks of AI, or know of a good source to learn about them. Given that awareness of opportunity is a prerequisite for desirable reactions (e.g, changes in career choice or personal advocacy), this seems very suboptimal.

I recently attempted to spread some AI risk related messages on LinkedIn. (Everyone on LinkedIn sees many posts about ChatGPT, so I no-longer assign much probability to the chance that someone who reads a post about AI Safety will become aware of the potential of AI and decide to speed up capabilities research instead. )

When doing the posts I attempt to find a 'hook' that gets attention (e.g., I link and discuss an interesting video or outline some ways to use GPT3 - see posts linked below), share some personal views, then segue into a nudge to read a good source of AI risk related information.

My hope is doing this occasionally, over time, can increase awareness of, and engagement with good sources of information on AI risk, and have positive flow on effects etc.

What is the likely alternative?

If I don't do posts like these it seems very unlikely that the people who read my posts will find out about AI risk for an extended period. I have yet to see a post on LinkedIn which mentions, or links to an 'EA perspective' on AI risk. Rarely do I see anything negative about AI - if so, such posts are focused on the short terms risks related to unemployment etc.

However, I find it hard (at least within the time I assign to writing content for LinkedIn) to communicate complex ideas while also engaging people on social media, and I wonder if I am simplifying things too much in my content.

With that in mind, I'd appreciate feedback from anyone is interested. This could be on my thoughts above, or on my two posts so far (see the two comments with links that I will add below). To leave feedback on the posts, please vote agree (if it seems ok/good to post like this) or disagree (if you think it is better to not do), or reply to the relevant comment. Thanks!

Akash Kulgod

No concrete useful feedback, just a note that I thought both posts were artfully tailored to your purpose and medium, nicely done!

Peter Slattery 🔸

Thanks, Akash, I really appreciate that you reviewed them and shared that!

Agree that in isolation, spreading the ideas of

(a) AI could be really powerful and important within our lifetimes

and

(b) Building AI too quickly/ incautiously could be dangerous

Could backfire.

But I think just removing the "incautiously" element, and focusing on the "too quickly element", and adding

Should be pretty effective in preventing people from thinking that we should race to creating AGI.

So essentially, AI could be really powerful, building it too quickly could be dangerous, we should fund lots of AI Safety research before its invented. I think adding more fidelity / detail / nuance would be net negative, given that they would slow down the spread of the message.

Also, I think we shouldn't take things OpenAI and DeepMind say at face value, and bear in mind the corrupting influence of the profit motive, motivated reasoning and 'safetywashing'.

Just because someone says they're making something that could make them billions of dollars because they think it will benefit humanity, doesn't mean they're actually doing it to benefit humanity. What they claim is a race to make safe AGI is probably significantly motivated by a race to make lots of money.

T_W

I think this series is out of order, at least if it's intended to be read in release order based on your website (and the into here "In this more recent series, I’ve been trying to..." indicates to me this is not the start but rather a more middle lying article and that I should stick with the release order to read the series.

Trev Prew

-10

I have a hammer, I could build a shed with it or I could murder someone with it. Should the inventors of hammers way back when, have prepare a risk assessment of the likely impact of their invention and taken appropriate action? Or is the problem with me and my potentially deranged use of a hammer? The underlying problem expressed in this article is fear. Fear of the unknown future, which was ably expressed over a hundred years ago in Mary Shelley's book Frankenstein. The book has never been out of print as, much like horror movies, people love being scared, in a safe sort of way. Humans cant make monsters, dinosaurs are extinct, and computers just do what they are programmed to do by humans - they are just another tool for our use, not the end of humanity. The article has no basis in fact and is therefore a work of fiction. I'm not scared of my hammer and no body should be scared of AI.

If you really want to make the world a better place, I suggest that more time is devoted to finding out why someone would use a hammer or AI for destructive use, or more relevant to the present, why we continue to use fossil fuels, when we know it is wrecking our only home ie planet earth.

“Great news - I’ve tested this AI and it looks safe.” Why might we still have a problem?
Problem	Key question	Explanation
The Lance Armstrong problem	Did we get the AI to be actually safe or good at hiding its dangerous actions?	When dealing with an intelligent agent, it’s hard to tell the difference between “behaving well” and “appearing to behave well.” When professional cycling was cracking down on performance-enhancing drugs, Lance Armstrong was very successful and seemed to be unusually “clean.” It later came out that he had been using drugs with an unusually sophisticated operation for concealing them.
The King Lear problem	The AI is (actually) well-behaved when humans are in control. Will this transfer to when AIs are in control?	It's hard to know how someone will behave when they have power over you, based only on observing how they behave when they don't. AIs might behave as intended as long as humans are in control - but at some future point, AI systems might be capable and widespread enough to have opportunities to take control of the world entirely. It's hard to know whether they'll take these opportunities, and we can't exactly run a clean test of the situation. Like King Lear trying to decide how much power to give each of his daughters before abdicating the throne.
The lab mice problem	Today's "subhuman" AIs are safe.What about future AIs with more human-like abilities?	Today's AI systems aren't advanced enough to exhibit the basic behaviors we want to study, such as deceiving and manipulating humans. Like trying to study medicine in humans by experimenting only on lab mice.
The first contact problem	Imagine that tomorrow's "human-like" AIs are safe. How will things go when AIs have capabilities far beyond humans'?	AI systems might (collectively) become vastly more capable than humans, and it's ... just really hard to have any idea what that's going to be like. As far as we know, there has never before been anything in the galaxy that's vastly more capable than humans in the relevant ways! No matter what we come up with to solve the first three problems, we can't be too confident that it'll keep working if AI advances (or just proliferates) a lot more. Like trying to plan for first contact with extraterrestrials (this barely feels like an analogy).

Situation	Appropriate reaction (IMO)
"This could be a billion-dollar company!"	"Woohoo, let's GO for it!"
"This could be the most important century!"	"... Oh ... wow ... I don't know what to say and I somewhat want to vomit ... I have to sit down and think about this one."

Spreading messages to help with the most important century

Spreading messages to help with the most important century

Challenges of AI-related messages

Messages that seem risky to spread in isolation

Messages that seem important and helpful (and right!)

We should worry about conflict between misaligned AI and all humans

AIs could behave deceptively, so “evidence of safety” might be misleading

AI projects should establish and demonstrate safety (and potentially comply with safety standards) before deploying powerful systems

Alignment research is prosocial and great

It might be important for companies (and other institutions) to act in unusual ways

We’re not ready for this

How to spread messages like these?

Footnotes