Hide table of contents

tldr: I conducted a series of interviews with 11 AI reseachers to discuss AI safety, which are located here: TRANSCRIPTION LINK. If you are interested in doing outreach with AI researchers, I highly recommend taking a look!

[Cross-posted to LessWrong.]


Overview

I recently conducted a series of interviews with 11 AI researchers, wherein I laid out some reasons to be concerned about long-term risks from AI.

These semi-structured interviews were 40-60 minutes long and conducted on Zoom. Interviewees were cold-emailed, were paid for their participation, and agreed that I may share their anonymized transcripts.

Six of the interviews were with researchers who had papers accepted at NeurIPS or ICML in 2021. Five of the interviews were with researchers who were informally categorized as “particularly useful to talk to about their opinions about safety” (generally more senior researchers at specific organizations).  

I’m attaching the raw transcripts from these 11 interviews, at the following link. I’ve also included the approximate script I was following, post-interview resources I sent to interviews, and informal interview notes in the associated “README” doc. Ideally I’d have some analysis too, and hopefully will in the future. However, I think it’s useful— particularly for people who plan to start similar projects— to read through a couple of these interviews, to get an intuitive feel for what conversations with established AI researchers can feel like.

Note: I also interviewed 86 researchers for a more complete academic, under-IRB study (whose transcripts won’t be released publicly), whose results will be posted about separately on LessWrong once I finish analyzing the data. There will be substantially more analysis and details in that release; this is just to get some transcripts out quickly. As such, I won't be replying to a lot of requests for details here.

Thanks to Sam Huang, Angelica Belo, and Kitt Morjanova, who helped clean up the transcripts! 


Personal notes 

  • I worry that sometimes young EAs think of researchers like a crowd of young, influenceable proto-EAs. One of my models about community-building in general is that there’s many types of people, some who will be markedly more sympathetic to AI safety arguments than others, and saying the same things that would convince an EA to someone whose values don’t align will not be fruitful. A second model is that older people who are established in their careers will have more formalized world models and will be more resistance to change. This means that changing one’s mind requires much more of a dialogue and integration of ideas into a world model than with younger people. The thing I want to say overall: I think changing minds takes more careful, individual-focused or individual-type-focused effort than would be expected initially.
  • I think one’s attitude as an interviewer matters a lot for outcomes. Like in therapy, which is also about changing beliefs and behaviors, I think the relationship between the two people substantially influences openness to discussion, separate from the persuasiveness of the arguments. I also suspect interviewers might have to be decently “in-group” to have these conversations with interviewees. However, I expect that that in-group-ness could take many forms: college students working under a professor in their school (I hear this works for the AltProtein space), graduate students (faculty frequently do report their research being guided by their graduate students) or colleagues. In any case, I think the following probably helped my case as an interviewer: I typically come across as noticeably friendly (also AFAB), decently-versed in AI and safety arguments, and with status markers. (Though this was not a university-associated project, I’m a postdoc at Stanford who did some AI work at UC Berkeley).
  • I’d be very excited if someone wants to take my place and continue on this project with one-on-one discussions with researchers. There’s a good chance I won’t have time to do this anymore, but I’m still very enthusiastic about this project. If you think you’d be a good fit, please reach out (vaelgates@gmail.com)! Things that help: friendliness, as much familiarity with AI arguments as you can manage (I also learned a lot about gaps in my models through these interviews), conscientiousness, and some way to get access to researchers. Extroversion would have helped me a lot too, but can’t have everything.

                                                                   TRANSCRIPTION LINK

Comments14


Sorted by Click to highlight new comments since:

Thanks, I think this is really valuable.

Two authors gave me permission to publish their transcripts non-anonymously! Thus:

- Interview with Michael L. Littman

- Interview with David Duvenaud

[anonymous]7
0
0

This is great work, I think it's really valuable to get a better sense of what AI researchers think of AI safety.

Often when I ask people in AI safety what they think AI researchers think of AGI and alignment arguments, they don't have a clear idea and just default to some variation on "I'm not sure they've thought about it much". Yet as these transcripts show, many AI researchers are well aware of AI risk arguments (in my anecdotal experience, many have read at least part of Superintelligence ) and have more nuanced views. So I'm worried that AI safety is insular w.r.t mainstream AI researchers thought on AGI - and these are people who in many cases have spent their working life thinking about AGI, so their thoughts are highly valuable, and this work goes some way to reversing that insularity.

A nice followup direction to take this would be to get a list of common arguments used by AI researchers to be less worried about AI safety (or about working on capabilities, which is separate), counterarguments, and possible counter-counter arguments. Do you plan to touch on this kind of thing in your further work with the 86 researchers?

Indeed! I've actually found that in most of my interviews people haven't thought about the 50+ year future much or heard of AI alignment, given that my large sample is researchers who had papers at NeurIPS or ICML. (The five researchers who were individually selected here had thought about AI alignment uncommonly much, which didn't particularly surprise me given how they were selected.)

A nice followup direction to take this would be to get a list of common arguments used by AI researchers to be less worried about AI safety (or about working on capabilities, which is separate), counterarguments, and possible counter-counter arguments. Do you plan to touch on this kind of thing in your further work with the 86 researchers?

Yes. With the note that the arguments brought forth are generally less carefully thought-through than the ones shown in the individually-selected-population, due to the larger population. But you can get a sense for some of the types of arguments in the six transcripts from NeurIPS / ICML researchers, though I wouldn't say it's fully representative.
 

I've been finding "A Bird's Eye View of the ML Field [Pragmatic AI Safety #2]" to have a lot of content that would likely be interesting to the audience reading these transcripts. For example, the incentives section rhymes with the type of things interviewees would sometimes say. I think the post generally captures and analyzes a lot of the flavor / contextualizes what it was like to talk to researchers.

I am currently pursuing a couple of projects that are intended to appeal to the sensibilities of AI researchers who aren't in the alignment community already. This has already been very useful for informing the communications and messaging I would use for those. I can see myself referring back to this often, when pursuing other field building activities too. Thanks a lot for publishing this!

With all the “AI psychology” posts on here and Twitter, I thought this was going to be “interviews with AIs that are researchers” not “interviews with humans researching AI”. This is probably more valuable!

Fantastic work. And thank you for transcribing!

I would find this a bit more readable if you compiled everything into one big pdf.

Thanks for doing this! The interviews are really interesting to read; the CEO example seems like something which perhaps should gain more prominence as a way to introduce AI-knowledgeable audiences to risks about alignment.

This is fantastic, thank you! 

Is there a summary of the main insights/common threads from the interviews? 

Still in progress as always, but this talk covers a lot of it! https://forum.effectivealtruism.org/posts/q49obZkQujkYmnFWY/vael-gates-risks-from-advanced-ai-june-2022

(Unfortunately the part about the insights isn't transcribed-- the first 20m is introduction, next ~30m is the description you want, last 10m is questions)

 

Thanks Vael!

This is such an incredibly useful resource, Vael! Thank you so much for your hard work on this project.

I really hope this project continues to go strong!

Curated and popular this week
Ben_West🔸
 ·  · 1m read
 · 
> Summary: We propose measuring AI performance in terms of the length of tasks AI agents can complete. We show that this metric has been consistently exponentially increasing over the past 6 years, with a doubling time of around 7 months. Extrapolating this trend predicts that, in under a decade, we will see AI agents that can independently complete a large fraction of software tasks that currently take humans days or weeks. > > The length of tasks (measured by how long they take human professionals) that generalist frontier model agents can complete autonomously with 50% reliability has been doubling approximately every 7 months for the last 6 years. The shaded region represents 95% CI calculated by hierarchical bootstrap over task families, tasks, and task attempts. > > Full paper | Github repo Blogpost; tweet thread. 
 ·  · 2m read
 · 
For immediate release: April 1, 2025 OXFORD, UK — The Centre for Effective Altruism (CEA) announced today that it will no longer identify as an "Effective Altruism" organization.  "After careful consideration, we've determined that the most effective way to have a positive impact is to deny any association with Effective Altruism," said a CEA spokesperson. "Our mission remains unchanged: to use reason and evidence to do the most good. Which coincidentally was the definition of EA." The announcement mirrors a pattern of other organizations that have grown with EA support and frameworks and eventually distanced themselves from EA. CEA's statement clarified that it will continue to use the same methodologies, maintain the same team, and pursue identical goals. "We've found that not being associated with the movement we have spent years building gives us more flexibility to do exactly what we were already doing, just with better PR," the spokesperson explained. "It's like keeping all the benefits of a community while refusing to contribute to its future development or taking responsibility for its challenges. Win-win!" In a related announcement, CEA revealed plans to rename its annual EA Global conference to "Coincidental Gathering of Like-Minded Individuals Who Mysteriously All Know Each Other But Definitely Aren't Part of Any Specific Movement Conference 2025." When asked about concerns that this trend might be pulling up the ladder for future projects that also might benefit from the infrastructure of the effective altruist community, the spokesperson adjusted their "I Heart Consequentialism" tie and replied, "Future projects? I'm sorry, but focusing on long-term movement building would be very EA of us, and as we've clearly established, we're not that anymore." Industry analysts predict that by 2026, the only entities still identifying as "EA" will be three post-rationalist bloggers, a Discord server full of undergraduate philosophy majors, and one person at
Thomas Kwa
 ·  · 2m read
 · 
Epistemic status: highly certain, or something The Spending What We Must 💸11% pledge  In short: Members pledge to spend at least 11% of their income on effectively increasing their own productivity. This pledge is likely higher-impact for most people than the Giving What We Can 🔸10% Pledge, and we also think the name accurately reflects the non-supererogatory moral beliefs of many in the EA community. Example Charlie is a software engineer for the Centre for Effective Future Research. Since Charlie has taken the SWWM 💸11% pledge, rather than splurge on a vacation, they decide to buy an expensive noise-canceling headset before their next EAG, allowing them to get slightly more sleep and have 104 one-on-one meetings instead of just 101. In one of the extra three meetings, they chat with Diana, who is starting an AI-for-worrying-about-AI company, and decide to become a cofounder. The company becomes wildly successful, and Charlie's equity share allows them to further increase their productivity to the point of diminishing marginal returns, then donate $50 billion to SWWM. The 💸💸💸 Badge If you've taken the SWWM 💸11% Pledge, we'd appreciate if you could add three 💸💸💸 "stacks of money with wings" emoji to your social media profiles. We chose three emoji because we think the 💸11% Pledge will be about 3x more effective than the 🔸10% pledge (see FAQ), and EAs should be scope sensitive.  FAQ Is the pledge legally binding? We highly recommend signing the legal contract, as it will allow you to sue yourself in case of delinquency. What do you mean by effectively increasing productivity? Some interventions are especially good at transforming self-donations into productivity, and have a strong evidence base. In particular:  * Offloading non-work duties like dates and calling your mother to personal assistants * Running many emulated copies of oneself (likely available soon) * Amphetamines I'm an AI system. Can I take the 💸11% pledge? We encourage A