Why EAs are skeptical about AI Safety

Lukas Trötzmüller🔸

Why EAs are skeptical about AI Safety

Lukas Trötzmüller🔸

35 min readJul 18, 2022

293

Comments 31

Sorted by

New & upvoted

Roddy MacSween

I think it would be interesting to have various groups (e.g. EAs who are skeptical vs worried about AI risk) rank these arguments and see how their lists of the top ones compare.

Yonatan Cale

Nice quality user research!

Consider adding a TL;DR including your calls to action - looking for collaborators and ideas for future projects, which I think will interest people

D_M_x

Thanks for doing this!

The strength of the arguments is very mixed as you say. If you wanted to find good arguments, I think it might have been better to focus on people with more exposure to the arguments. But knowing more about where a diverse set of EAs is at in terms of persuasion is good too, especially for AI safety community builders.

niplav

This solidifies a conclusion for me: when talking about AI risk, the best/most rigorous resources aren't the ones which are most widely shared/recommended (rigorous resources are e.g. Ajeya Cotra's report on AI timelines, Carlsmith's report on power-seeking AI, Superintelligence by Bostrom or (to a lesser extent) Human Compatible by Russell).

Those might still not be satisfying to skeptics, but are probably more satisfying than " short stories by Eliezer Yudkowsky" (though one can take an alternative angle: skeptics wouldn't bother reading a >100 page report, and I think the complaint that it's all short stories by Yudkowsky comes from the fact that that's what people actually read).

Additionally, there appears to be a perception that AI safety research is limited to MIRI & related organisations, which definitely doesn't reflect the state of the field—but from the outside this multipolarity might be hard to discover (outgroup-ish homogeneity bias strikes again).

[anonymous]

Personally I find Human Compatible the best resource of the ones you mentioned. If it were just the others I'd be less bought into taking AI risk seriously.

niplav

I agree that it occupies a spot on the layperson-understandability/rigor Pareto-frontier, but that frontier is large and the other things I mentioned are at other points.

[anonymous]

Indeed. It just felt more grounded in reality to me than the other resources which may appeal more to us laypeople and the non laypeople prefer more speculative and abstract material.

Oliver Sourbut

Seconded/thirded on Human Compatible being near that frontier. I did find its ending 'overly optimistic' in the sense of framing it like 'but lo, there is a solution!' while other similar resources like Superintelligence and especially The Alignment Problem seem more nuanced in presenting uncertain proposals for paths forward not as oven-ready but preliminary and speculative.

Lukas Trötzmüller🔸

I'm not quite sure I read the first two paragraphs correctly. Are you saying that Cotra, Carlsmith and Bostrom are the best resources but they are not widely recommended? And people mostly read short posts, like those by Eliezer, and those are accessible but might not have the right angle for skeptics?

niplav

Yes, I think that's a fair assessment of what I was saying.

Maybe I should have said that they're not widely recommended enough on the margin, and that there are surely many other good & rigorous-ish explanations of the problem out there.

I'm also always disappointed when I meet EAs who aren't deep into AI safety but curious, and the only things they have read is the List of Lethalities & the Death with Dignity post :-/ (which are maybe true but definitely not good introductions to the state of the field!)

Pablo

As a friendly suggestion, I think the first paragraph of your original comment would be less confusing if the parenthetical clause immediately followed "the best/most rigorous resources". This would make it clear to the reader that Cotra, Carlsmith, et al are offered as examples of best/most rigorous resources, rather than as examples of resources that are widely shared/recommended.

niplav

Thanks, will edit.

Guy Raveh

There are short stories by Yudkowsky? All I ever encountered were thousands-of-pages-long sequences of blog posts (which I hence did not read, as you suggest).

Yonatan Cale

Lots of it is here

Lumpyproletariat

If you're unconvinced about AI danger and you tell me what specifically are your cruxes, I might be able to connect you with Yudkowskian short stories that address your concerns.

The ones which come immediately to mind are:

That Alien Message

Sorting Pebbles Into Correct Heaps

Quadratic Reciprocity

I think I would have found Ajeya's cold takes guest post on "Why AI alignment could be hard with modern deep learning" persuasive back when I was skeptical. It is pretty short. I think the reason why I didn't find what you call "short stories by Eliezer Yudkowsky" persuasive was because they tended to not use concepts / terms from ML. I guess even stuff like orthogonality thesis and instrumental convergence thesis was not that convincing to me on a gut level even though I didn't disagree with the actual argument for them because I had the intuition that whether misaligned AI was a big deal depended on details of how ML actually worked, which I didn't know. To me back then it looked like most people I knew with much more knowledge of ML were not concerned about AI x-risk so probably it wasn't a big deal.

Marshall

Thanks! I thought this was great. I really like the goals of fostering a more in-depth discussion and understanding skeptics' viewpoints.

I'm not sure about modeling a follow-up project on Skeptical Science, which is intended (in large part) to rebut misinformation about climate change. There's essentially consensus in the scientific community that human beings are causing climate change, so such a project seems appropriate.

Is there an equally high level of expert consensus on the existential risks posed by AI?
Have all of the strongest of the AI safety skeptics' arguments been thoroughly debunked using evidence, logic, and reason?

If the answer to either of these questions is "no," then maybe more foundational work (in the vein of this interview project) should be done first. I like your idea of using double crux interviews to determine which arguments are the most important.

One other idea would be to invite some prominent skeptics and proponents to synthesize the best of their arguments and debate them, live or in writing, with an emphasis on clear, jargon-free language (maybe such a project already exists?).

Eli Rose🔸

Is there an equally high level of expert consensus on the existential risks posed by AI?

There isn't. I think a strange but true and important fact about the problem is that it just isn't a field of study in the same way e.g. climate science is — as argued in this Cold Takes post. So it's unclear who the relevant "experts" should be. Technical AI researchers are maybe the best choice, but they're still not a good one; they're in the business of making progress locally, not forecasting what progress will be globally and what effects that will have.

Marshall

Thanks! I agree - AI risk is at a much earlier stage of development as a field. Even as the field develops and experts can be identified, I would not expect a very high degree of consensus. Expert consensus is more achievable for existential risks such as climate science and asteroid impacts that can be mathematically modeled with high historical accuracy - there's less to dispute on empirical / logical grounds.

A campaign to educate skeptics seems appropriate for a mature field with high consensus, whereas constructively engaging skeptics supports the advancement of a nascent field with low consensus.

Chris Leong

One other idea would be to invite some prominent skeptics and proponents to synthesize the best of their arguments and debate them, live or in writing, with an emphasis on clear, jargon-free language (maybe such a project already exists?).

This is a pretty good idea!

[anonymous]

We could use kialo, a web app, to map those points and their counterarguments

[anonymous]

I can organize a session with my AI safety novice group to build the kialo

Marcel2

I have been suggesting this (and other uses of Kialo) for a while, although perhaps not as frequently or forcefully as I ought to… I( would recommend linking to the site, btw)

jacobpfau

Do you have a sense of which argument(s) were most prevalent and which were most frequently the interviewees crux?

It would also be useful to get a sense of which arguments are only common among those with minimal ML/safety engagement. If basic AI safety engagement reduces the appeal of a certain argument, then there's little need for further work on messaging in that area.

Vaidehi Agarwalla 🔸

Do you think the wording "Have you heard about the concept of existential risk from Advanced AI? Do you think the risk is small or negligible, and that advanced AI safety concerns are overblown? " might have biased your sample in some way?

E.g. I can imagine people who are very worried about alignment but don't think current approaches are tractable.

thecommexokid

In case "I can imagine" was literal, then let me serve as proof-of-concept, as a person who thinks the risk is high but there's nothing we can do about it short of a major upheaval of the culture of the entire developed world.

Lukas Trötzmüller🔸

The sample is biased in many ways: Because of the places where I recruited, interviews that didn't work out because of timezone difference, people who responded too late, etc. I also started recruiting on Reddit and then dropped that in favour of Facebook.

So this should not be used as a representative sample, rather it's an attempt to get a wide variety of arguments.

I did interview some people who are worried about alignment but don't think current approaches are tractable. And quite a few people who are worried about alignment but don't think it should get more resources.

Referring to my two basic questions listed at the top of the post, I had a lot of people say "yes" to (1). So they are worried about alignment. I originally planned to provide statistics on agreement / disagreement on questions 1/2 but it turned out that it's not possible to make a clear distinction between the two questions - most people, when discussing (2) in detail, kept referring back to (1) in complex ways.

Marcel2

Once again, I’ll say that a study which analyzed the persuasion psychology/sociology of “x-risk from AI” (e.g., what lines of argument are most persuasive to what audiences, what’s the “minimal distance / max speed” people are willing to go from “what is AI risk” to “AI risk is persuasive,” how important is expert statements vs. theoretical arguments, what is the role of fiction in magnifying or undermining AI x-risk fears) seems like it would be quite valuable.

Although I’ve never held important roles or tried to persuade important people, in my conversations with peers I have found it difficult to walk the line between “sounding obsessed with AI x-risk” and “under emphasizing the risk,” because I just don’t have a good sense of how fast I can go from someone being unsure of whether AGI/superintelligence is even possible to “AI x-risk is >10% this century.”

tlevin

Just added a link to the "A-Team is already working on this" section of this post to my "(Even) More EAs Should Try AI Safety Technical Research," where I observe that people who disagree with basically every other claim in this post still don't work on AI safety because of this (flawed) perception.

Kabir Kumar

I'd be very interested to see how many of them have changed their minds now.

Locke

Did any of these arguments change your beliefs about AGI? I'd always love to get a definition of General Intelligence since that seems like an ill posed concept.

Comments

Why EAs are skeptical about AI Safety

Why EAs are skeptical about AI Safety

Summary

Introduction

Methodology

How to read these arguments

Demographics

Current level of EA involvement

Experience with AI Safety

Are you working professionally with AI or ML?

Personal Remarks

Part 1: Arguments why existential risk from AGI seems implausible

Progress will be slow / Previous AI predictions have been wrong

AGI development is limited by a "missing ingredient"

AI will not generalize to the real world

Great intelligence does not translate into great influence

AGI development is limited by neuroscience

Many things would need to go wrong

AGI development is limited by training

Recursive self-improvement seems implausible

AGI will be used as a tool with human oversight

Humans collaborating are stronger than one AGI

Constraints on Power Use and Resource Acquisition

It is difficult to affect the physical world without a body

AGI would be under strict scrutiny, preventing bad outcomes

Alignment is easy

Alignment is impossible

We will have some amount of alignment by default

Just switch it off

AGI might have enormous benefits that outweigh the risks

Civilization will not last long enough to develop AGI

People are not alarmed

Part 2: Arguments why AI Safety might be overrated within EA

The A-Team is already working on this

We are already on a good path

Concerns about community, epistemics and ideology

We might overlook narrow AI risk

We might push too many people into the AI safety field

I want more evidence

The risk is too small

Small research institutes are unlikely to have an impact

Some projects have dubious benefits

Why don't we just ban AI development?

AI safety research is not communicated clearly enough for me to be convinced

Investing great resources requires a great justification

EA might not be the best platform for this

Long timelines mean we should invest less

We still have to prioritize

We need a better Theory of Change

Rogue Researchers

Recent Resources on AI Safety Communication & Outreach

Ideas for future projects

Looking for Collaborators