Why EAs are skeptical about AI Safety

Lukas Trötzmüller

TL;DR I interviewed 22 EAs who are skeptical about AI safety. My belief is that there is demand for better communication of AI safety arguments. I have several project ideas aimed in that direction, and am open to meeting potential collaborators (more info at the end of the post).

Summary

I interviewed 22 EAs who are skeptical about existential risk from Artificial General Intelligence (AGI), or believe that it is overrated within EA. This post provides a comprehensive overview of their arguments. It can be used as a reference to design AI safety communication within EA, as a conversation starter, or as the starting point for further research.

Introduction

In casual conversation with EAs over the past months, I found that many are skeptical of the importance of AI safety. Some have arguments that are quite well reasoned. Others are bringing arguments that have been convincingly refuted somewhere - but they simply did not encounter that resource, and stopped thinking about it.

It seems to me that the community would benefit from more in-depth discussion between proponents and skeptics of AI Safety focus. To facilitate more of that, I conducted interviews with 22 EAs who are skeptical about AGI risk. The interviews circled around two basic questions:

Do you believe the development of general AI can plausibly lead to human extinction (not including cases where bad actors intentionally use AI as a tool)?
Do you believe AI safety is overrated within EA as a cause area?

Only people who said no to (1) or yes to (2) were interviewed. The goal was to get a very broad overview of their arguments.

Methodology

My goal was to better understand the viewpoints of my interview partners - not to engage in debate or convince anyone. That being said, I did bring some counterarguments to their position if that was helpful to gain better understanding.

The results are summarized in a qualitative fashion, making sure every argument is covered well enough. No attempt was made to quantify which arguments occured more or less often.

Most statements are direct quotes, slightly cleaned up. In some cases, when the interviewee spoke verbosely, I suggested a summarized version of their argument, and asked for their approval.

Importantly, the number of bullet points for each argument below does not indicate the prevalence of an argument. Sometimes, all bullet points correspond to a single interviewee and sometimes each bullet point is from a different person. Sub-points indicate direct follow-ups or clarifications from the same person.

Some interviewees brought arguments against their own position. These counterarguments are only mentioned if they are useful to illuminate the main point.

General longtermism arguments without relation to AI Safety were omitted.

How to read these arguments

Some of these arguments hint towards specific ways in which AI safety resources could be improved. Others might seem obviously wrong or contradictory in themselves, and some might even have factual errors.

However, I believe all of these arguments are useful data. I would suggest looking behind the argument and figuring out how each point hints at specific ways in which AI safety communication can be improved.

Also, I take responsibility for some of the arguments perhaps making more sense in the original interview than they do here in this post (taken out of context and re-arranged into bullet points).

Demographics

Interview partners were recruited from the /r/EffectiveAltruism subreddit, from the EA Groups slack channel and the EA Germany slack channel, as well as the Effective Altruism Facebook group. The invitation text was roughly like this:

Have you heard about the concept of existential risk from Advanced AI? Do you think the risk is small or negligible, and that advanced AI safety concerns are overblown? I'm doing research into people's beliefs on AI risk. Looking to interview EAs who believe that AI safety gets too much attention and is overblown.

Current level of EA involvement

How much time each week do you spend on EA activities, including your high-impact career, reading, thinking and meeting EAs?

1 hour or less: 30%
2-5 hours: 35%
more than 5 hours: 20%
more than 20 hours: 15%

Experience with AI Safety

How much time did you spend, in total, reading / thinking / talking / listening about AI safety?

less than 10 hours: 10%
10-50: 50%
100-150 hours: 25%
150 hours or more: 15%

Are you working professionally with AI or ML?

55%: no
20%: it's a small part of my job
25%: yes (including: AI developer, PhD student in computational neuroscience, data scientist, ML student, AI entrepreneur)

Personal Remarks

I greatly enjoyed having these conversations. My background is having studied AI Safety for about 150 hours throughout my EA career. I started from a position of believing in substantial existential AGI risk this century. Only a small number of arguments seemed convincing to me, and I have not meaningfully changed my own position through these conversations. I have, however, gained a much deeper appreciation of the variety of counterarguments that EAs tend to have.

Part 1: Arguments why existential risk from AGI seems implausible

Progress will be slow / Previous AI predictions have been wrong

They have been predicting it's going to happen for a long time now, and really underpredicting how soon it's going to be. Progress seems a lot slower than a lot of futurists have been saying.
It is very hard to create something like AGI. We are doing all this technical development; we had 3-ton mobile phones 20 years ago and now we have smartphones that would not be imaginable back then. So, yeah, we might advance to the point where AGI is possible to exist within the next 70-80 years, but it is unlikely.
Moore's law was a law until 2016 - there are more and more walls that we run up against. It's hard to know: what is the wall we can push over and which walls are literally physical limitations of the universe?
The speed of AI development so far does not warrant a belief in human level AI any time soon.
A lot of conventional AI safety arguments are like "well, currently there are all these narrow systems that are nevertheless pretty powerful, and theoretically you can make a general version of that system that does anything" and they take for granted that making this jump is not that hard comparatively speaking. I'm sure it's not something that will be impossible forever. But at the very least, the difficulty of getting to that point is frequently being understated in AI safety arguments.
The timeframe question is really hard. I look at AI from my own field (I'm a translator). Translation was one of the fields that has been predicted to be replaced by ML for decades, well before other fields (artists). I am familiar with how often the predictions about ML capabilities have gone wrong. I know ML is quite good these days, but still in the course of my work I notice very acutely many parts where it simply just does a bad job. So I have to fix errors. This is why the timeframe question is quite hard for me.

AGI development is limited by a "missing ingredient"

Well, the fundamental piece behind justification of [AI safety] research is: a meaningful probability that something superintelligent is going to exist in the next decades... And I feel that this underlying thesis is motivated by some pieces of evidence, like success of modern ML algorithms (GPT-3)... But I feel this is not good evidence. The causality between "GPT-3 being good at mimicking speech" and "therefore superintelligence is closer than we think". I think that link is faulty because current algorithms all capture statistical relationships. This to me is a necessary condition to AGI but not sufficient. And the sufficient is not known yet. Evidence that statistical learning is very successful is not evidence that AGI is close.
- [Interviewer: Wouldn't it be possible that we discover this missing piece quite soon?] You'd have to think about what your definition of general intelligence is. If you think about human intelligence, humans have many features that statistical learning doesn't have. For example, embodiment. We exist in a world that is outside of our control. By evolution we have found ways to continue to exist and the requirement of continual existence is to develop the ability to do statistical learning ourselves... but also that state learning is something that has sprouted off something else, and THAT something else is the fact that we were placed in an environment and left to run for a long time. So maybe the intelligence is a consequence of that long process...
- [Interviewer: I don't quite understand. Would you say there is a specific factor in evolution that is missing in machine learning?] I think we need, in addition to statistical learning, to think "what else did evolution provide that together comprises intelligence?" And I do not think this is going to be as simple as designing a slightly better statistical learning algorithm and then all of the sudden it works. You can't just take today's AI and put it in a robot and say "now it's general."
- Therefore... My main argument against the notion that superintelligent AI is a serious immediate risk is that there is something missing (and I think everyone would agree), and for me the progress in the things that we do have does not count as evidence that we are getting closer to the thing that's missing. If anything we are no closer to the thing that's missing than we were 5 years ago. I think the missing pieces are mandatory, and we don't even know what they are.
- One could also argue that I'm moving the goalposts, but also I suppose... I feel like somebody who is worried about AGI risk now would have a lot in common with someone who is worried about AGI risk 30 years ago when Deep Blue beat Kasparov. I feel the actuality of both situations is not so different. Not much has changed in AI except there are better statistical learning algorithms. Not enough to cause concern.
- Maybe researching towards "what is the missing link towards AGI? How would we get there, is it even possible" might be more impactful than saying "Suppose we have a hypothetical superintelligence, how do we solve the control problem?"
I truly believe that there is no impetus for growth without motivation. AI does not have an existential reason, like evolution had. This constrains its growth. With computers, there is no overarching goal to their evolution. Without an existential goal for something to progress towards, there is no reason for it to progress.
- [Interviewer: Could AI training algorithms provide such a goal?] I just don't know. Because we have been running training algorithms for decades (rudimentary at first, now more advanced) but they are still idiots. They don't understand what they are doing. They may have real world applications, but it doesn't know that it's identifying cancer... it is identifying a pattern in a bunch of pixels.
- [Interviewer: But what is it that you think an AI will never be able to do?] Verbal and nonverbal communication with humans. [Explanation about how their job is in medicine and that requires very good communication with the patient]
- [Interviewer: Anything else?] You can ask the computer any number of things, but if you ask it a complex or nuanced question its not going to give you an answer. *[Interviewer: What would such a question be?] e.g. "What timeframe do you think AGI is going to come about?" - they might regurgitate some answer from an expert on an internet, its never going to come up with a reasonable prediction on its own. Because it doesn't understand that this is a complex question involving a bunch of fields.

AI will not generalize to the real world

I think the risk is low due to the frame problem of AI - making them fundamentally different from a conscious mind. AIs must exist within a framework that we design and become 'fitter' according to fitness functions we design, without the ability to 'think outside the box' that we have created for it. The danger would come if an untested AI with a very broad/vague goal such as "protect humanity" was given direct control of many important systems with no human decision making involved in the process, however I think a scenario such as this is highly unlikely.

Great intelligence does not translate into great influence

The world is just so complicated - if you look at a weather model that predicts just the weather, which is much more simple than the world in total, to somewhat predict it a day ahead or two that's possible, but further ahead it often fails. Its a first-order chaotic system. But the world around us is a second-order chaotic system. I dont see how you could suddenly have an instance to destroy humanity accidentially or otherwise. The world is just way too complicated - an AI couldn't just influence it easily.
I don’t think superintellience will be all that powerful. Outcomes in the real world are dominated by fundamentally unpredictable factors, which places limits on the utility of intelligence.

AGI development is limited by neuroscience

The speed of AI development so far does not warrant even human level AGI any time soon. What really underscores that is there is a rate-limiting effect of neuroscience and cognitive science. We need to first understand human intelligence before we build AGI. AGI will not come before we have a good understanding of what makes the brain work.
In terms of how we understand intelligence, it is all biological and we don't have any idea how it really works. We have a semblance of an idea, but when we are working with AI, we are working in silicon and it is all digital, whereas biological intelligence is more like stochastic and random and probabilistic. There are millions of impulses feeding into billions of neurons and it somehow makes up a person or a thought. And we are so far away from understanding what is actually going on - the amount of processing in silicon needed to process a single second of brain activity is insane. And how do we translate this into the transistors? It is a very big problem that I don't think we have even begun to grasp.
I think future AI systems will be more general than current ones, but I don't think they will be able to do "almost anything." This strikes me as inconsistent with how AI systems, at least at the moment, actually work (implement fancy nonlinear functions of training data).
- I will admit to being surprised by how much you can get out of fancy nonlinear functions, which to me says something about how surprisingly strong the higher level patterns in our textual and visual languages are. I don't really see this as approaching "general intelligence" in any meaningful way.
- I don't think we have any real sense of how intelligence/consciousness come from the human brain, so it's not clear to me to what extent a computer simulation would be able to do this.

Many things would need to go wrong

So many different things would need to go wrong in order to produce existential risk. Including governments giving AI lots of control over lots of different systems all at once.

AGI development is limited by training

An AIs domain of intelligence is built on doing millions or billions of iterations within a particular area where it learns over time. But when you have interfaces between the AI an the human within our real world, its not gonna have these millions of iterations that it can learn from. In particular, it is impossible (or very hard, or will not be done in practice) to create a virtual training ground where it could train to deceive real human. (This would require simulating human minds). And thus, it will not have this capability.

Recursive self-improvement seems implausible

I'm not totally bought into the idea that as soon as we have an AI a bit smarter than us, it's automatically going to become a billion times smarter than us from recursive self-improvement. We have not seen any evidence of that. We train AI to get better at one thing and it gets better at this thing, but AI doesn't then take the reins and make itself better at the other things.
Self-improvement is doubtful. I can't think of an approach to machine learning in which this kind of behavior would be realistic. How would you possibly train an AI to do something like this? Wouldn't the runtime be much too slow to expect any sort of fast takeoff? Why would we give an AI access to its own code, and if we didn't intend to, I don't understand what kind of training data would possibly let it see positive consequences to this kind of behavior.
[Interviewer: Could a brain be simulated and then improve itself inside the computer?] I suspect this would run significantly more slowly than a real human brain. More fundamentally, I am not convinced a [digital] human even with 1000 years of life could easily improve themselves. We don't really know enough about the brain for that to be remotely feasible, and I don't see any reason an AI should either (unless it was already superintelligent).

AGI will be used as a tool with human oversight

I know it's a black box, but it's always supervised and then you overlay your human judgement to correct the eccentricities of the model, lack of data, .... It's hard for me to see a commercial application where we tell an AGI to find a solution to the problem and then we wouldn't judge the solution. So there is not much risk it would do its own thing, because it works always with humans like a tool.
If you look at what industry is doing with current AI systems, it's not designed towards AI where there isn't human supervision. In commercial applications, the AI is usually unsupervised and it usually doesn't have too much control. It's treated like a black box, understood like a black box.
I just don't understand why you would get the AI to both come up with the solution as well as implement it. Rather, I believe there will always be a human looking at the solution, trying to understand what it really means, before implementing it (potentially using a completely different machine).
[Interviewer: Couldn't the machine bribe its human operator?] --> No, it could not successfully bribe its operators, there would always be multiple operators and oversight.

Humans collaborating are stronger than one AGI

I think the power of human intelligence is sometimes overstated. We haven't become more intelligent over the past 10k years. The power of our level of intelligence to affect the world is just because we have organized ourselves in large groups over the past 10k years. Millions of separate actors. That makes us powerful, rather than an individual - as a single actor intelligence. So even a superintelligent actor which is just one, doesn't have all the civilizational background, could never defeat humanity with its 10k years of background.

Constraints on Power Use and Resource Acquisition

I don’t believe fast takeoff is plausible. Creation of advanced AI will be bottlenecked by real-word, physical resources that will require human cooperation and make it impossible for AGI to simply invent itself overnight.
I don't believe exponential takeoff is possible. It would require the AGI to find a way to gather resources that completely supercedes the way resources are allocated in our society. I don't see a risk where the AGI, in an exponential takeoff scenario, could get the kind of money it would need to really expand its resources. So even from an infrastructure perspective, it will be impossible. For example: Consider the current safety measures for credit card fraud online. We assume in an exponential takeoff scenario that the AI would be able to hack bank systems and override these automated controls. Perhaps it could hack a lot, but a lot of real-world hacking involves social engineering. It would need to call someone and convince that person - or hire an actor to do that. This sounds extremely far fetched to me.
- [Interviewer: "Could the AI not just make a lot of money on the stock market?"] Yes, but the idea that an exponential AI could affect the stock exchange without causing a financial collapse is far-fetched... and even an AI with a googleplex parameters could probably not fully understand the human stock market. This would take years - someone would notice and switch the AGI off.
I don't think that silicon will ever get to the point where it will have the efficiency to run something on the level of complexity of a human mind without incredible amounts of power. Once that happens, I really doubt it could get out of control just because of the power constraints.
Well, the silicon scaling laws with silicon are running out. They're running into physical boundaries of how small and power-efficient transistors can get. So the next generation of GPUs is just trying to dump twice as much wattage on them. The transistors are getting smaller but barely. Eventually we get to the point where quantum tunneling is going to make silicon advancement super difficult.
Yes we might be able to get AGI with megawatts of power on a supercomputer.... but
1. Could you afford to do that when you could be using the computer power to do literally anything else?
2. The AGI will probably be really dumb because it would be very inefficient, and it could not easily scale up due to power constraints. It is not going to be smart.
An AI would also be limited by the physical manifacturing speed of new microchips.
Yes we might be able to get AGI with megawatts of power on a supercomputer.... but could you afford to use that when you could be using the computer power to do literally anything else?
There are material science constraints to building faster computers that are difficult to circumvent.

It is difficult to affect the physical world without a body

Surely you can affect the world without having a physical body - but many things you cannot affect that way. Surely you can influence everything on the internet, but I fail to see how an AGI could, for example, destroy my desk. Yes I know there are robots, but I would think there will be possibilities to stop it as soon as we realize it's going awry.
The idea that all you need is internet access is often used to support the idea of an AGI expanding their resources. But in reality you need a physical body or at least agents who have that.
- [Interviewer: "but couldn't the AI hire someone through the internet?"] Yes, but it would be discovered. And the AGI would have to convince human beings to put their lives on the line to do this thing. There are all these requisite stages to get there. Plus get the money.
It's nice to talk about the paperclip problem, but we don't have a mechanism to describe scientifically how a machine would directly convert matter into processing power. Entirely within the realm of sci-fi. Right now, such a machine would only steal people's money, crash Netflix... But it could not affect the real world, so it's in a way less dangerous than a person with a gun.

AGI would be under strict scrutiny, preventing bad outcomes

But just being really smart doesn’t mean it is suddenly able to take over the world or eradicate humanity. Even if that were its explicit goal, that takes actual capital and time. And it’d be operating under strict scrutiny and would face extreme resistance from from organizations with vastly greater resources if its plans were ever suspected. World domination is just a legitimately hard thing to do, and I don’t believe it’s feasible to devise a plan to destroy humanity and secretly amass enough resources to accomplish the task without arousing suspicion or resistance from very powerful forces.
In the end, for me to take the threat of AGI seriously, I have to believe that there’s a tipping point beyond which we can no longer prevent our eradication, the tipping point can’t be detected until we’ve already passed it, we haven’t already crossed that point, actions we take today can meaningfully reduce our probability of reaching it, and that we can actually determine what those actions are take them. I don’t believe all of those statements are true, so I’m not all that worried about AGI.
Some would say one rogue AI is equivalent to the entire nuclear arsenal. I don't believe that. I would believe that only when I see an AI destroy a city.

Alignment is easy

If we're doing it correctly, then instrumental convergence will be no problem. And it is quite easy to define that fitness function to not allow the AI to gather more resources, for example.
An AI is just matching patterns from input to output. If we were to restrict that pattern recognition to a small subset of inputs and outputs, for example, an AI that just tries to figure out what number is being drawn, there is no risk that it's going to go and take over the world. As long as you are restricting the space it is allowed to work in, and the in-built reward system is doing a good job, as long as you get that right, there is no risk that it will misinterpret what you ask it to do.
It's unlikely that an AI intelligent enough to prevent itself from being switched off or to recognize which resources it should gather would be unintelligent enough to know that it has stepped outside the bounds of its core directive. It would be easy to keep it aligned as long as it has a core directive that prevents it from doing these instrumental harmful things. The current risk stories are based upon AI that is designed without guardrails.
- [Interviewer: "What if someone deletes the guardrails?"] - This would be noticed within some window of time, and I do not think exponential takeoff can happen in that timeframe.
It would be easy to give the AGI limited means, and through these limited means the potential damage is limited.

Alignment is impossible

Our [human] goals are always constantly conflicting, so to think we could align an AGI with our constantly fluctuating goals is not realistic anyways.
I have no idea how you ensure such a thing is magnanimous and truly intelligent, or if it is truly intelligent, if it is going to be free to make modifications on itself. It is going to contain our own intelligence. How do we exert our will on that? Alignment is very improbable. All we can do is set it off and hope that there is some kind of natural alignment between the AI and our goals.

We will have some amount of alignment by default

The AI will absorb some human morality from studying the world - they [human morals] will not be peeled out! There will be some alignment by default because we train it in material obtained from humans.
If it becomes superintelligent, it will probably get some amount of wisdom, and it will think about morality and will probably understand the value of life and of its own existence and why it should not exterminate us.
I don't see why it would exterminate humanity. I agree it can prevent itself from being switching off, but I don't think it will try to exterminate humanity if it has at least a little wisdom.

Just switch it off

It is probably not possible, within modern security infrastructure, for a machine to prevent itself from being switched off. Even modern-day security infrastructure is good enough to do that. So even without gatekeeper or tripwire it would be quite safe, and with those additional measures, even more so.
I find it doubtful that a machine would prevent itself from being switched off. It isn't enough for an action to help pursue a goal: the AI would have to know that the action would help it reach it's goal. You'd have to hard-code "being turned off is bad" into its training data somehow, or you'd have to really screw up the training (e.g. by including a "AI gets turned off" possibility in its model, or by giving low scores when the AI gets turned off in real life, rather than e.g. not counting the trial.)
- [upon further questioning] I agree that an AI could implement a multi-step plan [for achieving its ambitions]. I disagree that "don't get turned off" could be a step in this plan, in anything remotely resembling the current paradigm. Machine learning models can learn either by observing training data, or by simulating play against itself or another AI. Unless simulated play penalizes getting turned off (e.g. a bad actor intentionally creating an evil AI) instead of merely not counting such simulations (or not even allowing that as a possibility in simulations), the possibility of getting turned off can't enter the model's objective function at all.

AGI might have enormous benefits that outweigh the risks

Throughout history, technology has ultimately solved more problems than it has created, despite grave warnings with each new advancement. AI will dramatically improve humans’ ability to solve problems, and think this will l offset the new problems it creates.

Civilization will not last long enough to develop AGI

I believe there needs to be a long period of stability. I do not believe our society will be stable long enough for enough technological progress to accrue that eventually leads to AGI.
One of the reasons why AGI existential risk might be lower is that I think there are other existential risks that might come first. Those might not lead to human extinction, but to the extinction of a technological civilization that could sustain AI research and the utilization of AI. Those are also more likely to hit before end of century [than AGI risks are].

People are not alarmed

If AI safety was really such an issue, I would expect more people to say so. I find it shocking that 41% of AI researchers think that safety already gets enough attention. (referencing this twitter post )
Many people in the general public are quite chill about it, most don't even think about it. Most people do not believe it will be an actual problem.
Even among AI researchers and AI safety researchers, most people are more confident than not that we can solve this problem in time. This, to me, seems like a strong argument that the small group of doomsayers might be wrong.

Part 2: Arguments why AI Safety might be overrated within EA

The A-Team is already working on this

All the amazing people working on AI Safety now, they will be able to prevent this existential threat. I have a lot of faith in them. These research institutes are very strong and they are working so hard on this. Even if other fields have more resources and more people, I don't think those resources & people are as competent as those in the AI safety field in many cases.
Yes, I think AI safety is overrated within EA. You have so many amazing people working on AI safety. I think they already have enough people and resources. I have met some of these people and they are truly amazing. Of course these people wouldn't do any more good if they did another topic because this is the topic they are expert on, so their resources are in the right place. And if other people want to go into AI Safety too, then I say go for it [because then they will be working from their strengths].

We are already on a good path

I agree it is important to invest a lot of resources into AI safety, but I believe that the current resources (plus the ones we expect to be invested in the future) are probably enough to fix the problem.

Concerns about community, epistemics and ideology

The AI safety community looks like a cult around Eliezer Yudkowsky (I don't trust it because it looks similar to cults I've seen). He is deliberately cultivating an above-everything mystique.
The discussion seems too unipolar, too much focussed on arguments from Eliezer.
AI safety seems like the perfect mind virus for people in tech - the thing that I work on every day already, THAT is the most important thing ever in history.
How much are you willing to trust AI researchers saying that their own problem is the most important one? "my problem will cause total human extinction"?
I think there could possibly be a bit of, when you start working on something it becomes your whole world. You start thinking about the issues and problems, and you can at some point lose objectivity. You are so integrated into fixing that problem or issue. E.g. medical students often think they have some of the diseases they are studying. It might become more real because you are thinking about it all the time. I wonder if that results in biasing people to place a lot of emphasis on it. (Possible solution: Maybe putting more emphasis on adjacent areas like explainability, there might be more opportunities there)
I believe this AI issue is becoming important within EA because we are all driven by our emotions. It feels plausible that we get wiped out (fear)... and also you get a lot of status by working on this problem. I sense that is why everyone is moved to work on this problem. Its seems like a great thing to work on. And then everyone uses their intelligence to justify what they are doing (which is to work on this problem) because what else do you work on, its a really cool problem to work on. So theres a lot of effort going on to demonstrate that this is an existential risk. because of natural human motivations. I can understand how this community has come to work on this because it feels like an important issue, and if you stop to work on this, then what else would you work on? The reason everyone is working on it is that its a very human thing caused by human emotions. [Interviewer: Would you call that "motivated reasoning"?] Yeah, everyone is motivated to get prestige and status on what seems like a very important problem, and once you have invested all that time and effort you dont want to be in a position to admit its not such a serious problem.
I don't understand how people are very convinced about this likelihood of AI risk. I don't understand how this can even be talked about or discussed about in such detail - that you can even quantify some amount of risk. I am especially worried about a lot of EAs very strongly believing that this is the most likely existential risk because I don't understand where this intuition comes from. I am worried that this belief is traceable back to a few thinkers that EA respect a lot, like Nick Bostrom, Eliezer Yudkowsky and some others. It looks to me that a lot of arguments get traced to them, and people who are not involved in the EA community do not seem to come up with the same beliefs by themselves. This, to me, seems that there is a possibility that this has been strongly over-emphasized because of some echo chamber reasons. I believe there is some risk to AGI. But I think it is likely that this risk is very over-emphasized and people are over-confident about this likelihood that we should focus so many resources on AI risk.
I'm concerned about this ideology justifying doing whatever the rich and powerful of today want. I'm concerned this leads us to invest into rich countries (because AGI development is more likely to occur in the developed world). So it might take away resources from present-day suffering. There is a big risk that this might be another way for the developed world and rich people to prioritize their own causes. This is not the vibe I get from EA people at the moment. But it is not outlandish to imagine that their ideology will be misused. We cannot be naive about how people might misuse it - it's like a secular religion.

We might overlook narrow AI risk

What worries me is that this discussion is focused on AGI, but there could be other catastrophic risks related to powerful narrow AI systems. I don't see much discussion on that. This is, to me, something that looks like something is missing. People are too focused on these AGI narratives.
I think we're already seeing quite high levels of AI, e.g. in targeting ads and the way facebook has divided society. These are the problems we should be focusing on rather than the other issue which is steps further away and staggeringly unlikely. It would be better if the efforts were focused on narrow AI safety.
I think it would be great to fund narrow AI safety research. But I also acknowledge, this is less useful for preventing x-risk [compared with focus on AGI safety].

We might push too many people into the AI safety field

The kind of phrasing and importance attributed to it takes people who would have a strong comparative advantage in some other field and has them feel like they have to work on AI safety even though they might not be as well-suited to it. If you believe there could be existential AI risk in the next 10 years that does make sense. But I am not that convinced that is the case.
It seems to me that there aren't actually that many AI safety positions available. There are not that many companies or professors working on this. There is room for independent researchers, which the AI safety camp supports, okay. But still it seems like the current wording in EA is pushing a larger amount of people into AI safety than the current capacity. And it might be difficult because we push so many people into this area and then we get people who are not as qualified and then they might feel guilty if they don't get jobs. Doesn't seem that healthy to be having all these people thinking "if I'm not working on AI then I am not contributing."

I want more evidence

We do not have factual evidence that AI kills people, and certainly not a large number of people. AI has not eradicated humanity yet. And I doubt it will ever happen, simply because previous predictions were wrong so often.
I would say this [meaning AGI existential risk scenarios] is an instance of Pascal's mugging, as we don't have enough evidence.
The difference between longtermists and non-longtermists is their degree of willingness to put trust in expert opinion.
A lot of AI writers like Yudkowsky and maybe Scott Alexander, they throw a lot of numbers around of the likelihood of AI. But that's just the reflection of someone's belief, it is not the result of actual research.
What I'm really missing is more concrete data, especially on examples of AI that have gone awry. I haven't seen enough examples from researchers being like "this is the kind of thing it has done in the past, and these are some examples of how it could really go wrong in the future". At the same time, I do believe some of the catastrophe scenarios, but I also have a big emotional thing telling me that maybe people will do the right thing and we don't need to invest that much resources into safety. This comes perhaps from a lack of communication or data from people who are doing this research. I heard lots of big arguments, but I want more specifics: Which AI things already went wrong? What unexpected results did people get with GPT-3 and DALL-E? I would also like to hear more about DeepBrain.
I wonder if I have a basic misunderstanding of EA because as it sounds from an outsider point of view, if it is EA and trying to find good opportunities for philantropy to have the most effect then it does seem like focus on AI safety is overrated - there is no tangible evidence I have seen that AGI is possible to create or even that in our present-day world there is evidence for its possibility. I haven't seen any examples that lead me to think that it could happen. If somebody showed me a nice peer-reviewed research paper and trials that said "oh yeah we have this preliminary model for an AGI" then I would be a little bit more spooked but its something i have never seen but I do see people starving.

The risk is too small

In the framework of EA, you want to work on something that you know you can have an impact on. Given infinite resources, one should work on AGI safety definitely. But given finite resources, if you don't have sufficient knowledge what the risk is (and in this case, the risk is vanishingly small) in that case it is quite difficult to justify thinking about it as much as people currently are.

Small research institutes are unlikely to have an impact

Progress in AI capabilities and alignment has been largely driven by empirical work done by small groups of individuals working at highly resourced (in compute, money, and data access) organizations, like OpenAI, Deemind, Google, etc. Theoretical work done at places like MIRI, FHI, etc. have not contributed meaningfully to this progress, aside from popularizing issues of AI alignment and normalizing discussion around them.
- Ergo, EA funding for alignment is only directly helpful if it influences these organizations directly, influences who joins these groups in the future, or influences regulators. Individuals or organizations outside of those certain few organizations are typically not able to contribute meaningfully to alignment as a result, and should not receive critical funds that should instead be spent on influencing these groups.
- I don't think [these small research institutes] will find anything (very) important in the first place because they don't have direct access to manipulate and explore the very models that are being built by these organizations. Which are the only models in my mind that are a) pushing AI capabilities forward and b) have any real likelihood of becoming AGI eventually.

Some projects have dubious benefits

One of the issues I got into EA through was global poverty stuff. Like, probably people did more 5 years ago than they do now. If someone wants to research AI risk, I'm not going to stop them. But then people say we're going to offer 1000s of dollars of prizes for imagining how hypothetical AI worlds might look like, because in theory people might be inspired by that... The causal chain is tenuous at best. I look at that and say, how many children you could save with that money. It is sort of true that EA isn't funding-constraint, but there are clearly things that you could do if you wanted to just take that money and save as many children's lives as possible. I worry that people have gotten so focused on the idea that AI risk is so important and dwarves everything else, that they throw money at anything that is vaguely connected to the problem. People might say "anything is a good investment of money because the problem is so big."

Why don't we just ban AI development?

I have been critical about the AI safety discourse because I have never really gotten an answer to the question "if we consider the AI risk to be such a meaningful thing and such a large event", then why are we not talking about a total ban on AI development right now?

AI safety research is not communicated clearly enough for me to be convinced

I have very little understanding of what the actual transmission and safety mechanisms are for deterring existential AI risk. In climate science, for example, we are NOT limited in our ability to affect the climate positively by a lack of understanding about atmospheric science. Those details are very clear. Whereas in AI, it really does seem that everything I hear is by way of analogy - paperclip maximizers, alignment issues.... I would like to know, I think it would benefit the AI movement, to publicize some of the technical approaches by which they try to address this. And if the field is in a state where they aren't even getting to the nitty gritty (e.g. a set of optimization problems or equations), then they should publicize THAT.
- I realize this is a bit of a weird request like "hey educate me". But when I give to Givewell, I can read their reports. There has been years of soft questioning and pushback by a lot of peole int he community. If there was as tron gcase to be made by researchers about the concrete effectiveness of their projects, I hope it would be made in a more clear and a more public way. I don't want to read short stories by Eliezer Yudkowsky any more.
- Short stories by EY are great for pulling in casuals online but at this point we should have some clear idea of what problems (be they hard or soft) the community is reckoning with. The EA community is technically literate, so it would be worth exposing us to more of the nitty gritty details.

Investing great resources requires a great justification

I was struck by the trajectory of open philanthropic grantmaking allocation where global health interventions led the pack for a number of years. Now AI safety is a close second. That warrants a substantial justification in proportion to its resource outlay.

EA might not be the best platform for this

One of my criticisms of the EA movement is that I'm not sure it's the best platform for promoting investment in AI research. When you have a charitable movement and then you have people who are active in AI research saying that we need to put mounds of money into AI research, which happens to be a field that gives them their jobs and benefits them personally, then many people are going to become very suspicious of this. Is this just an attempt for all of these people to defraud us so they can live large?. This is one of the reasons why I have been suspicious about the EA movement, but it's been good to meet people who are in EA but not part of the AI community. People who don't benefit from AI research makes me more positive about the EA movement. In these terms, I would also say that this would be a point why the AI research thing should be discussed on many other platforms, chiefly on the political and scientific fields, but the EA community may not be the best platform for this topic.

Long timelines mean we should invest less

Many people pretend that AI safety timelines don't matter, as if long-term timelines present the same problems as short timelines. They say AI is coming in 30-60 years, an unspoken assumption. But when you push them and say, we will be able to fix the safety problems because it will not come that soon, they say, no the arguments still stand even if you assume long timelines. My problem with the AI safety argument is that if AGI is coming in 50-60 years (short timelines), investing in AI safety research makes a lot of sense. If timelines are longer, we should still put an effort but not that much. In the short-term, more nuclear weapons and bioweapons. I don't think AGI is coming in the next 50 years. Beyond that, I'm not sure. There has been talk that current ML models, when scaled up, will get us to AGI. I don't believe that. Therefore we should spend less resources than on nuclear war and bioweapons.
Since it might be a very long time until we develop even human-level AGI, it might be a mistake to invest too many resources now.
- [Interviewer: What if it would come in 20 years?] If I actually was convinced that the timeline is like 20 years, then I would definitely say let's invest a lot of effort, even if it's not likely to succeed.
- I am pretty sure about long timelines because of past failures of predictions. And because current ML models might be very far away from real intelligence, and I don't trust these narrative speculative explanations that current ML models would get there (you cannot reliably predict that since it is not physics where you can reliably calculate things).

We still have to prioritize

I believe there are quite many existential risks, then of course you get to the point where you only have a limited amount of resources, and it looks like over the next years the resources will be more and more limited considering we are on the brink of an economic crisis. So when you have limited amount of resources, even if we theoretically think its worth spending enormous amount of resources to solve AI safety, then we still have to do the tradeoffs [with other risks] at some point.
I think it has been exagerrated. It is important to me to make sure that when we give attention to AI, we should not ignore other risks that have the potential of shaping our civilization in the wrong direction.

We need a better Theory of Change

I haven't seen a clear Theory of Change for how this kind of work would actually reduce AI risk. My prior from work in development is that nice-sounding ideas without empirical backing tend to fail, and often tend to make the problem worse. So without stronger evidence that a) x-risk from AI is likely and b) working on AI risk can actually mitigate this problem, I am very strongly opposed to its status as a flagship EA cause.

Rogue Researchers

We cannot meaningfully reduce AGI risk because of rogue researchers. Do you think a human has been cloned? How do you know? If a rogue researcher decides to do it, outside the bounds of current controls... unless we are somehow rate-limiting every environment of processing that we can create, there is not a good way to prevent anyone from doing that.

Recent Resources on AI Safety Communication & Outreach

Ideas for future projects

I would like to suggest Skeptical Science, which is a library of climate change arguments, as a role model for how AI safety material can be made accessible to the general public. In particular, I like that the website offers arguments in several levels of difficulty. I would like to start something like this and am looking for collaborators (see below).
There could be an AI safety "support hotline", allowing EAs who are curious and not involved with the field to ask questions and get referred to the right resources for them. Something similar, but for people who are already considering moving into the field, is AI Safety Support.
It would be interesting to conduct these interviews in more detail using the Double Crux technique, bringing in lots of counterarguments and then really uncovering the key cruxes of interview partners.

Looking for Collaborators

I am looking for ways to improve the AI safety outreach and discourse. Either by getting involved in an existing project or launching something new. Send me a message if you're interested to collaborate, would like help with your project, or would just like to bounce ideas around.

Roddy MacSween2y53

I think it would be interesting to have various groups (e.g. EAs who are skeptical vs worried about AI risk) rank these arguments and see how their lists of the top ones compare.

Yonatan Cale2y24

Nice quality user research!

Consider adding a TL;DR including your calls to action - looking for collaborators and ideas for future projects, which I think will interest people

Denise_Melchin2y19

Thanks for doing this!

The strength of the arguments is very mixed as you say. If you wanted to find good arguments, I think it might have been better to focus on people with more exposure to the arguments. But knowing more about where a diverse set of EAs is at in terms of persuasion is good too, especially for AI safety community builders.

niplav2y10

This solidifies a conclusion for me: when talking about AI risk, the best/most rigorous resources aren't the ones which are most widely shared/recommended (rigorous resources are e.g. Ajeya Cotra's report on AI timelines, Carlsmith's report on power-seeking AI, Superintelligence by Bostrom or (to a lesser extent) Human Compatible by Russell).

Those might still not be satisfying to skeptics, but are probably more satisfying than " short stories by Eliezer Yudkowsky" (though one can take an alternative angle: skeptics wouldn't bother reading a >100 page report, and I think the complaint that it's all short stories by Yudkowsky comes from the fact that that's what people actually read).

Additionally, there appears to be a perception that AI safety research is limited to MIRI & related organisations, which definitely doesn't reflect the state of the field—but from the outside this multipolarity might be hard to discover (outgroup-ish homogeneity bias strikes again).

[anonymous]2y9

Personally I find Human Compatible the best resource of the ones you mentioned. If it were just the others I'd be less bought into taking AI risk seriously.

niplav2y8

I agree that it occupies a spot on the layperson-understandability/rigor Pareto-frontier, but that frontier is large and the other things I mentioned are at other points.

[anonymous]2y2

Indeed. It just felt more grounded in reality to me than the other resources which may appeal more to us laypeople and the non laypeople prefer more speculative and abstract material.

Oliver Sourbut2y2

Seconded/thirded on Human Compatible being near that frontier. I did find its ending 'overly optimistic' in the sense of framing it like 'but lo, there is a solution!' while other similar resources like Superintelligence and especially The Alignment Problem seem more nuanced in presenting uncertain proposals for paths forward not as oven-ready but preliminary and speculative.

Lukas Trötzmüller2y8

I'm not quite sure I read the first two paragraphs correctly. Are you saying that Cotra, Carlsmith and Bostrom are the best resources but they are not widely recommended? And people mostly read short posts, like those by Eliezer, and those are accessible but might not have the right angle for skeptics?

niplav2y3

Yes, I think that's a fair assessment of what I was saying.

Maybe I should have said that they're not widely recommended enough on the margin, and that there are surely many other good & rigorous-ish explanations of the problem out there.

I'm also always disappointed when I meet EAs who aren't deep into AI safety but curious, and the only things they have read is the List of Lethalities & the Death with Dignity post :-/ (which are maybe true but definitely not good introductions to the state of the field!)

Pablo2y19

As a friendly suggestion, I think the first paragraph of your original comment would be less confusing if the parenthetical clause immediately followed "the best/most rigorous resources". This would make it clear to the reader that Cotra, Carlsmith, et al are offered as examples of best/most rigorous resources, rather than as examples of resources that are widely shared/recommended.

niplav2y1

Thanks, will edit.

Guy Raveh2y2

There are short stories by Yudkowsky? All I ever encountered were thousands-of-pages-long sequences of blog posts (which I hence did not read, as you suggest).

Yonatan Cale2y4

Lots of it is here

Lumpyproletariat2y2

If you're unconvinced about AI danger and you tell me what specifically are your cruxes, I might be able to connect you with Yudkowskian short stories that address your concerns.

The ones which come immediately to mind are:

That Alien Message

Sorting Pebbles Into Correct Heaps

Quadratic Reciprocity2y1

I think I would have found Ajeya's cold takes guest post on "Why AI alignment could be hard with modern deep learning" persuasive back when I was skeptical. It is pretty short. I think the reason why I didn't find what you call "short stories by Eliezer Yudkowsky" persuasive was because they tended to not use concepts / terms from ML. I guess even stuff like orthogonality thesis and instrumental convergence thesis was not that convincing to me on a gut level even though I didn't disagree with the actual argument for them because I had the intuition that whether misaligned AI was a big deal depended on details of how ML actually worked, which I didn't know. To me back then it looked like most people I knew with much more knowledge of ML were not concerned about AI x-risk so probably it wasn't a big deal.

Marshall2y9

Thanks! I thought this was great. I really like the goals of fostering a more in-depth discussion and understanding skeptics' viewpoints.

I'm not sure about modeling a follow-up project on Skeptical Science, which is intended (in large part) to rebut misinformation about climate change. There's essentially consensus in the scientific community that human beings are causing climate change, so such a project seems appropriate.

Is there an equally high level of expert consensus on the existential risks posed by AI?
Have all of the strongest of the AI safety skeptics' arguments been thoroughly debunked using evidence, logic, and reason?

If the answer to either of these questions is "no," then maybe more foundational work (in the vein of this interview project) should be done first. I like your idea of using double crux interviews to determine which arguments are the most important.

One other idea would be to invite some prominent skeptics and proponents to synthesize the best of their arguments and debate them, live or in writing, with an emphasis on clear, jargon-free language (maybe such a project already exists?).

Eli Rose2y8

Is there an equally high level of expert consensus on the existential risks posed by AI?

There isn't. I think a strange but true and important fact about the problem is that it just isn't a field of study in the same way e.g. climate science is — as argued in this Cold Takes post. So it's unclear who the relevant "experts" should be. Technical AI researchers are maybe the best choice, but they're still not a good one; they're in the business of making progress locally, not forecasting what progress will be globally and what effects that will have.

Marshall2y7

Thanks! I agree - AI risk is at a much earlier stage of development as a field. Even as the field develops and experts can be identified, I would not expect a very high degree of consensus. Expert consensus is more achievable for existential risks such as climate science and asteroid impacts that can be mathematically modeled with high historical accuracy - there's less to dispute on empirical / logical grounds.

A campaign to educate skeptics seems appropriate for a mature field with high consensus, whereas constructively engaging skeptics supports the advancement of a nascent field with low consensus.

Chris Leong2y4

This is a pretty good idea!

[anonymous]2y7

We could use kialo, a web app, to map those points and their counterarguments

[anonymous]2y4

I can organize a session with my AI safety novice group to build the kialo

Harrison Durland2y2

I have been suggesting this (and other uses of Kialo) for a while, although perhaps not as frequently or forcefully as I ought to… I( would recommend linking to the site, btw)

jacobpfau2y6

Do you have a sense of which argument(s) were most prevalent and which were most frequently the interviewees crux?

It would also be useful to get a sense of which arguments are only common among those with minimal ML/safety engagement. If basic AI safety engagement reduces the appeal of a certain argument, then there's little need for further work on messaging in that area.

Vaidehi Agarwalla2y5

Do you think the wording "Have you heard about the concept of existential risk from Advanced AI? Do you think the risk is small or negligible, and that advanced AI safety concerns are overblown? " might have biased your sample in some way?

E.g. I can imagine people who are very worried about alignment but don't think current approaches are tractable.

thecommexokid2y1

In case "I can imagine" was literal, then let me serve as proof-of-concept, as a person who thinks the risk is high but there's nothing we can do about it short of a major upheaval of the culture of the entire developed world.

Lukas Trötzmüller2y1

The sample is biased in many ways: Because of the places where I recruited, interviews that didn't work out because of timezone difference, people who responded too late, etc. I also started recruiting on Reddit and then dropped that in favour of Facebook.

So this should not be used as a representative sample, rather it's an attempt to get a wide variety of arguments.

I did interview some people who are worried about alignment but don't think current approaches are tractable. And quite a few people who are worried about alignment but don't think it should get more resources.

Referring to my two basic questions listed at the top of the post, I had a lot of people say "yes" to (1). So they are worried about alignment. I originally planned to provide statistics on agreement / disagreement on questions 1/2 but it turned out that it's not possible to make a clear distinction between the two questions - most people, when discussing (2) in detail, kept referring back to (1) in complex ways.

Harrison Durland2y4

Once again, I’ll say that a study which analyzed the persuasion psychology/sociology of “x-risk from AI” (e.g., what lines of argument are most persuasive to what audiences, what’s the “minimal distance / max speed” people are willing to go from “what is AI risk” to “AI risk is persuasive,” how important is expert statements vs. theoretical arguments, what is the role of fiction in magnifying or undermining AI x-risk fears) seems like it would be quite valuable.

Although I’ve never held important roles or tried to persuade important people, in my conversations with peers I have found it difficult to walk the line between “sounding obsessed with AI x-risk” and “under emphasizing the risk,” because I just don’t have a good sense of how fast I can go from someone being unsure of whether AGI/superintelligence is even possible to “AI x-risk is >10% this century.”

tlevin2y4

Just added a link to the "A-Team is already working on this" section of this post to my "(Even) More EAs Should Try AI Safety Technical Research," where I observe that people who disagree with basically every other claim in this post still don't work on AI safety because of this (flawed) perception.