TLDR: Last year, Vael Gates interviewed 97 AI researchers about their perceptions on the future of AI, focusing on risks from advanced AI. Among other questions, researchers were asked about the alignment problem, the problem of instrumental incentives, and their interest in AI alignment research. Following up after 5-6 months, 51% reported the interview had a lasting effect on their beliefs. Our new report analyzes these interviews in depth. We describe our primary results and some implications for field-building below. Check out the full report (interactive graph version), a complementary writeup describing whether we can predict a researcher’s interest in alignment, and our results below!
[Link to post on LessWrong]
This report (interactive graph version) is a quantitative analysis of 97 interviews conducted in Feb-March 2022 with machine learning researchers, who were asked about their perceptions of artificial intelligence (AI) now and in the future, with particular focus on risks from advanced AI systems. Of the interviewees, 92 were selected from NeurIPS or ICML 2021 submissions, and 5 were informally recommended experts. For each interview, a transcript was generated, and common responses were identified and tagged to support quantitative analysis. The transcripts, as well as a qualitative walkthrough of common perspectives, are available at Interviews.
Several core questions were asked in these interviews:
- When advanced AI (~AGI) would be developed (note that this term was imprecisely defined in the interviews)
- A probe about the alignment problem: “What do you think of the argument ‘highly intelligent systems will fail to optimize exactly what their designers intended them to, and this is dangerous’?”
- A probe about instrumental incentives: “What do you think about the argument: ‘highly intelligent systems will have an incentive to behave in ways to ensure that they are not shut off or limited in pursuing their goals, and this is dangerous’?”
- Whether interviewees were interested in working on AI alignment, and why or why not
- Whether interviewees had heard of AI safety or AI alignment
Some key findings from our primary questions of interest:
- Most participants (75%), at some point in the conversation, said that they thought humanity would achieve advanced AI (imprecisely labeled “AGI” for the rest of this summary) eventually, but their timelines to AGI varied. Within this group:
- 32% thought it would happen in 0-50 years
- 40% thought 50-200 years
- 18% thought 200+ years
- and 28% were quite uncertain, reporting a very wide range.
(These sum to more than 100% because several people endorsed multiple timelines over the course of the conversation.) (Source)
- Among participants who thought humanity would never develop AGI (22%), the most commonly cited reason was that they couldn't see AGI happening based on current progress in AI. (Source)
- Participants were pretty split on whether they thought the alignment problem argument was valid. Some common reasons for disagreement were (Source):
- A set of responses that included the idea that AI alignment problems would be solved over the normal course of AI development (caveat: this was a very heterogeneous tag).
- Pointing out that humans have alignment problems too (so the potential risk of the AI alignment problem is capped in some sense by how bad alignment problems are for humans).
- AI systems will be tested (and humans will catch issues and implement safeguards before systems are rolled out in the real world).
- The objective function will not be designed in a way that causes the alignment problem / dangerous consequences of the alignment problem to arise.
- Perfect alignment is not needed.
- Participants were also pretty split on whether they thought the instrumental incentives argument was valid. The most common reasons for disagreement were that 1) the loss function of an AGI would not be designed such that instrumental incentives arise / pose a problem and 2) there would be oversight (by humans or other AI) to prevent this from happening. (Source)
- Some participants brought up that they were more concerned about misuse of AI than AGI misalignment (n = 17), or that potential risk from AGI was less dangerous than other large-scale risks humanity faces (n = 11). (Source)
- Of the 55 participants who were asked / had a response to this question, some (n = 13) were potentially interested in working on AI alignment research. (Caveat for bias: the interviewer was less likely to ask this question if the participant believed AGI would never happen and/or the alignment/instrumental arguments were invalid, so as to reduce participant frustration. This question also tended to be asked in later interviews rather than earlier interviews.)
- Of those participants potentially interested in working on AI alignment research, almost all reported that they would need to learn more about the problem and/or would need to have a more specific research question to work on or incentives to do so.
- Those who were not interested reported feeling like it was not their problem to address (they had other research priorities, interests, skills, and positions); that they would need examples of risks from alignment problems and/or instrumental incentives within current systems to be interested in this work; or that they felt like they were not at the forefront of such research so would not be a good fit. (Source)
- Most participants had heard of AI safety (76%) in some capacity (source); fewer had heard of AI alignment (41%) (source).
- When participants were followed-up with ~5-6 months after the interview, 51% reported the interview had a lasting effect on their beliefs (source), and 15% reported the interview caused them to take new action(s) at work (source). Additionally, some participants were asked if they’d changed their mind about anything during the interview, and 24/58 (41%) agreed (caveat for bias: the interviewer tended to avoid asking this question to people who seemed very unlikely to have changed their minds, especially those who seemed frustrated with the interview, and it was only added as an explicit question in later interviews).
- Thinking the alignment problem argument was valid, or the instrumental incentives argument was valid, both tended to correlate with thinking AGI would happen at some point. The effect wasn’t symmetric: if participants thought these arguments were valid, they were quite likely to believe AGI would happen; if participants thought AGI would happen, it was still more likely that they thought these arguments were valid but the effect was less strong. (Source)
Implications for Field Building
- Believing that AGI will happen is important for receptivity to AGI risk arguments, which means field builders should have some attention toward timelines. However, many people believed AGI would happen in these interviews, and that number will likely increase with time as AI systems continue to develop more capabilities.
- Anecdotally, thinking “AGI will never happen” often represents a significant worldview difference, and should be engaged with as such.
- Anecdotally, the people who haven’t thought about AGI or AGI timelines before were more likely to revise their estimates during the interview.
- For a sense of how often people revised their estimates, search for “an alternative solution” here. Of the people who said at any point that AGI wouldn’t happen, 9/30 (30%) ultimately provided an estimate of when AGI might happen during the conversation (4 thought “200+ years”, 3 thought “50-200 years”, and 2 thought “0-50 years”). Checking four of these, we found that timeline estimates were provided later in the conversation, suggesting a possible revision of the “AGI will never happen” belief.
- When talking with researchers who are skeptical humanity will ever develop AI, field builders should be prepared to discuss how the technical challenges of present-day systems may not prevent future AGI development. Additionally, some interviewees were unaware of the current progress of AI systems (although awareness is increasing with commercialization of GPT-like systems), and it is helpful to have prepared examples. A fellow AI researcher with a strong technical background may be best situated to have this discussion.
- It seems like many of the objections to alignment problem arguments arose from an intuition that the worst-case scenario with AGI is either not particularly bad or not particularly likely. Others arose from the fact that there is inherently a lot of uncertainty about the future. Field builders should expect to encounter such perspectives.
- Many researchers think, "other people are on top of this." They believe entities like society, leading AI companies, or the government have or in the future will have regulations and safety measures that will prevent things from getting out of control. They may be interested to hear how much slower, smaller, and less funded the safety community is compared with the communities building these systems.
- Field builders should provide clear guidance on what distinguishes risks from advanced AI from concerns about human misalignment / bad actors. It’s also often important to separate out “the problems that may arise from advanced AI” and “the problems that are arising from current-day AI” (e.g. misuse, near-term safety) as overlapping but distinct problems.
- Researchers potentially interested in working on AI alignment research commonly wanted to know the scope of AI alignment research, and the potential overlap with their own work. In particular, researchers were looking for concrete, specific technical problems they could work on using their available skill sets. Providing updated, technical research fit information is likely quite important for getting these researchers further involved.
- Anecdotally, the more philosophical presentations of risk in these interviews sometimes left researchers with the impression that there weren’t technical problems in AI alignment that their skills could address. For this and related reasons, we believe that the philosophical presentation of AI risk arguments used in these interviews is worse than a more technical, research-oriented presentation for AI researchers.
- People who submitted papers to major AI conferences (NeurIPS or ICML) vary with their familiarity with AI safety and especially AI alignment. Field builders should have several presentations available, with different amounts of background and ties to neighboring fields.
- We think these interviews went generally well: many participants reported it had lasting effects on their beliefs and some even took new actions at work because of it. Several interviewees reported changing their minds over the course of the conversation and the exchange was positive enough that 95% responded to recontact efforts. There were likely many factors that contributed to this relative success, but the long-form one-on-one format was likely one of them.
While the current report summarizes responses from the primary questions asked in the interview, there is also an accompanying writeup of how responses interact with each other and other participant features: i.e., whether we can use demographics, or other information about the researchers, to predict their interest (or lack thereof) in AI alignment. Take a look here! Predicting researcher interest in AI alignment
More Information, Full Report, and Further Posts
The above is a relatively concise description of the interviews and results. To read more, see below:
- Full Report (Interactive Graph Version)
- Predicting researcher interest in AI alignment, an accompanying writeup summarizing how responses interact with each other and other participant features
- Interviews, to check out the interview transcripts and further resources
- AI Risk Discussions (EA Forum Post), to explore the broader website this that houses this work
We welcome feedback!
Analysis and writing by Maheen Shermohammed, with help from Vael Gates.
Interviews were conducted by Vael Gates, with guidance from Mary Collier Wilks.
Tagging was completed by Zi Cheng (Sam) Huang and Vael Gates.
Copyediting of the report by David Spearman, and copyediting of this writeup by Lukas Trötzmüller.
This project was supported by the AI Safety Field-Building Hub.
The eagle-eyed may note that this is out of 30 despite an earlier statement that only 22% (i.e. 21 people) thought humanity would never develop AGI. That's because the 9 who had timeline tags (meaning they also expressed some belief that it would happen) were removed for this 22% estimate. (Source)