Effective Persuasion For AI Alignment Risk

Brian Lui

Comments 7

Sorted by

New & upvoted

I'm always keen to think about how to more effectively message EA ideas, but I'm not totally sure what the alternative, effective approach is. To clarify, do you think Nintil's argument is basically the right approach? If so, could you pick out some specific quotes and explain why/how they are less inferentially distant?

Jose Luis Ricon

Hi, I'm the author of Nintil.com (We met at Future Forum :)

Essentially, an essential rule in argumentation is that the premises have to be more plausible than the conclusion. For many people, foom scenarios, nanotech, etc makes them switch off.

I have this quote

Here I want to add that the lack of criticism is likely because really engaging with these arguments requires an amount of work that makes it irrational for someone who disagrees to engage. I make a similar analogy here with homeopathy: Have you read all the relevant homeopathic literature and peer-reviewed journals before dismissing it as a field? Probably not. You would need some reason to believe that you are going to find evidence that will change your mind in that literature. In the case of AI risk, the materials required to get someone to engage with the nanotech/recursive self-improvement cases should include sci-fi free cases for AI risk (Like the ones I gesture at in this post) and perhaps tangible roadmaps from our current understanding to systems closer to Drexlerian nanotech (Like Adam Marblestone's roadmap).

Basically, you can't tell people "Nanotech is super guaranteed to happen, check out this book from Drexler". If they don't agree with that, they won't read it, there is too much commitment. Instead, one should just start from premises the average person agrees with (speed, memory, strategic planning) and get to the "AI risk is worth taking seriously". That is a massive step up from them laughing at it. Then one can argue about timelines and how big of a risk it is, but first one has to bring them into the conversation, and my arguments accomplishes (I hope) that.

Miranda_Zhang

This makes a lot of sense, thanks so much!

I think I agree with this point, but in my experience I don't see many AI safety people using these inferentially-distant/extreme arguments in outreach. That's just my very limited anecdata though.

Brian Lui

Excellent! This is the core of the point that I wanted to communicate. Thanks for laying it out so clearly.

Brian Lui

Great! Yes. The key part I think is this:

Advanced nanotechnology, often self-replicating
Recursive self-improvement (e.g. an intelligence explosion)
Superhuman manipulation skills (e.g. it can convince anyone of anything)
There are exceptions to this, like the example I discuss in Appendix C.
I found that trying to reason about AGI risk scenarios that rely on these is hard because I keep thinking that these possibly run into physical limitations that deserve more thought before thinking they are plausible enough to substantially affect my thinking. It occurred to me it would be fruitful to reason about AGI risk taking these options off the table to focus on other reasons one might suspect AGIs would have overwhelming power:
Speed¹ (The system has fast reaction times)
Memory (The system could start with knowledge of all public data at the time of its creation, and any data subsequently acquired would be remembered perfectly)
Superior strategic planning (There are courses of actions that might be too complex for humans to plan in a reasonable amount of time, let alone execute)

My view is that normal people are unreceptive to arguments that focus on the first three (advanced nanotechnology, recursive self-improvement, superhuman manipulation skills). Leave aside whether these are probable or not. Just talking about it is not going to work, because the "ask" is too big. It would be like going to rural Louisiana and talking at them about intersectionality.

Normal people are receptive to arguments based on the last three (speed, memory, superior strategic planning). Nintil then goes on to make an argument based only on these ideas. This is persuasive. The reason is that it's easy for people to accept all three premises:

Computers are very fast. This accords with people's experience.
Computers can store a lot of data. People can understand this, too.
Superior strategic planning might be slightly trickier, but it's still easy to grasp, because people know that computers can beat the strongest humans at chess and go.

Miranda_Zhang

Thanks, this makes sense! Yeah, this is why many arguments I see start at a more abstract level, e.g.

We are building machines that will become vastly more intelligent than us (c.f. superior strategic planning), and it seems reasonable that then we won't be able to predict/control them
Any rational agent will strategically develop instrumental goals that could make it hard for us to ensure alignment (e.g., self-preservation -> can't turn them off)

Brian Lui

I might have entered at a different vector (all online) so I experienced a different introduction to the idea! If my experience is atypical, and most people get the "gentle" introduction you described, that is great news.

Comments

More from the author

New cause area: bivalve aquaculture

Brian Lui·4y ago·4m read

Against longtermism

Brian Lui·3y ago·8m read

Cause Exploration Prizes submission: bivalve aquaculture

Brian Lui·4y ago·13m read

Curated and popular this week

Cultivating hope: calibrating the expectations for cultivated meat to end factory farming

PabloAMC 🔸·1w ago·Curated 2d ago·22m read

GWWC's 2025 impact evaluation (executive summary)

Aidan Whitfield🔸, Giving What We Can🔸·4d ago·2m read

This post presents the executive summary from Giving What We Can’s impact evaluation for 2025. At the end of this post we share links to more information, including the full report and...

Maybe do the thing you wish CEA would do

alejoacelas 🔸·1d ago·2m read

I used AI to fix transcription errors, rerrarange the ideas, and suggest tweaks to the title and some sentences. Three of the most exciting projects to come out of EA in recent years are, in a vague sense, CEA spinouts: * Kairos is directly a spinout of CEA and now handles most support for university AI safety groups. Basically everyone I've found who knows them is really excited about what they do * NEST is an opinionated ideas-fi...

Recent opportunities to take action

RP is looking for project founders in neglected animal areas

Rethink Priorities·2d ago·7m read

Announcing the Safe Pareto Improvements (SPI) Fundamentals Program

Center on Long-Term Risk, Anthony DiGiovanni 🔸, Santeri T 🔹·1d ago·3m read

158

Possible mistake EAs are making and shout out to Pause AI UK

Michelle_Hutchinson·1w ago·4m read

Brian Lui

Great! Yes. The key part I think is this:

Advanced nanotechnology, often self-replicating
Recursive self-improvement (e.g. an intelligence explosion)
Superhuman manipulation skills (e.g. it can convince anyone of anything)
There are exceptions to this, like the example I discuss in Appendix C.
I found that trying to reason about AGI risk scenarios that rely on these is hard because I keep thinking that these possibly run into physical limitations that deserve more thought before thinking they are plausible enough to substantially affect my thinking. It occurred to me it would be fruitful to reason about AGI risk taking these options off the table to focus on other reasons one might suspect AGIs would have overwhelming power:
Speed¹ (The system has fast reaction times)
Memory (The system could start with knowledge of all public data at the time of its creation, and any data subsequently acquired would be remembered perfectly)
Superior strategic planning (There are courses of actions that might be too complex for humans to plan in a reasonable amount of time, let alone execute)

Computers are very fast. This accords with people's experience.
Computers can store a lot of data. People can understand this, too.
Superior strategic planning might be slightly trickier, but it's still easy to grasp, because people know that computers can beat the strongest humans at chess and go.

Effective Persuasion For AI Alignment Risk

Outline

My experience of learning about AI alignment

The current persuasion approach is ineffective

The current persuasion approach is correct!

An alternative, effective approach

We should use the effective approach, not the ineffective approach