Vael Gates

Postdoc at Stanford, working on a set of interviews of AI researchers! (they/them)

Topic Contributions

Comments

Transcripts of interviews with AI researchers

I've been finding "A Bird's Eye View of the ML Field [Pragmatic AI Safety #2]" to have a lot of content that would likely be interesting to the audience reading these transcripts. For example, the incentives section rhymes with the type of things interviewees would sometimes say. I think the post generally captures and analyzes a lot of the flavor / contextualizes what it was like to talk to researchers.

[$20K In Prizes] AI Safety Arguments Competition

This isn't particularly helpful since it's not sorted, but some transcripts with ML researchers: https://www.lesswrong.com/posts/LfHWhcfK92qh2nwku/transcripts-of-interviews-with-ai-researchers

My argument structure within these interviews was basically to ask them these three questions in order, then respond from there. I chose the questions initially, but the details of the spiels were added to as I talked to researchers and started trying to respond to their comments before they made them.

1. “When do you think we’ll get AGI / capable / generalizable AI / have the cognitive capacities to have a CEO AI if we do?”

  • Example dialogue: “All right, now I'm going to give a spiel. So, people talk about the promise of AI, which can mean many things, but one of them is getting very general capable systems, perhaps with the cognitive capabilities to replace all current human jobs so you could have a CEO AI or a scientist AI, etcetera. And I usually think about this in the frame of the 2012: we have the deep learning revolution, we've got AlexNet, GPUs. 10 years later, here we are, and we've got systems like GPT-3 which have kind of weirdly emergent capabilities. They can do some text generation and some language translation and some code and some math. And one could imagine that if we continue pouring in all the human investment that we're pouring into this like money, competition between nations, human talent, so much talent and training all the young people up, and if we continue to have algorithmic improvements at the rate we've seen and continue to have hardware improvements, so maybe we get optical computing or quantum computing, then one could imagine that eventually this scales to more of quite general systems, or maybe we hit a limit and we have to do a paradigm shift in order to get to the highly capable AI stage. Regardless of how we get there, my question is, do you think this will ever happen, and if so when?”


2. “What do you think of the argument ‘highly intelligent systems will fail to optimize exactly what their designers intended them to, and this is dangerous’?”

  • Example dialogue: “Alright, so these next questions are about these highly intelligent systems. So imagine we have a CEO AI, and I'm like, "Alright, CEO AI, I wish for you to maximize profit, and try not to exploit people, and don't run out of money, and try to avoid side effects." And this might be problematic, because currently we're finding it technically challenging to translate human values preferences and intentions into mathematical formulations that can be optimized by systems, and this might continue to be a problem in the future. So what do you think of the argument "Highly intelligent systems will fail to optimize exactly what their designers intended them to and this is dangerous"?
     

3. “What do you think about the argument: ‘highly intelligent systems will have an incentive to behave in ways to ensure that they are not shut off or limited in pursuing their goals, and this is dangerous’?”

  • Example dialogue: “Alright, next question is, so we have a CEO AI and it's like optimizing for whatever I told it to, and it notices that at some point some of its plans are failing and it's like, "Well, hmm, I noticed my plans are failing because I'm getting shut down. How about I make sure I don't get shut down? So if my loss function is something that needs human approval and then the humans want a one-page memo, then I can just give them a memo that doesn't have all the information, and that way I'm going to be better able to achieve my goal." So not positing that the AI has a survival function in it, but as an instrumental incentive to being an agent that is optimizing for goals that are maybe not perfectly aligned, it would develop these instrumental incentives. So what do you think of the argument, "Highly intelligent systems will have an incentive to behave in ways to ensure that they are not shut off or limited in pursuing their goals and this is dangerous"?”
Transcripts of interviews with AI researchers

Indeed! I've actually found that in most of my interviews people haven't thought about the 50+ year future much or heard of AI alignment, given that my large sample is researchers who had papers at NeurIPS or ICML. (The five researchers who were individually selected here had thought about AI alignment uncommonly much, which didn't particularly surprise me given how they were selected.)

A nice followup direction to take this would be to get a list of common arguments used by AI researchers to be less worried about AI safety (or about working on capabilities, which is separate), counterarguments, and possible counter-counter arguments. Do you plan to touch on this kind of thing in your further work with the 86 researchers?

Yes. With the note that the arguments brought forth are generally less carefully thought-through than the ones shown in the individually-selected-population, due to the larger population. But you can get a sense for some of the types of arguments in the six transcripts from NeurIPS / ICML researchers, though I wouldn't say it's fully representative.
 

What psychological traits predict interest in effective altruism?

I just did a fast-and-dirty version of this study with some of the students I'm TAing for, in a freshman class at Stanford called "Preventing Human Extinction". No promises I got all the details right, in either the survey or the analysis.

—————————————————————————————————

QUICK SUMMARY OF DATA FROM https://forum.effectivealtruism.org/posts/7f3sq7ZHcRsaBBeMD/what-psychological-traits-predict-interest-in-effective

MTurkers (n=~250, having a hard time extracting it from 1-3? different samples):
- expansive altruism (M = 4.4, SD = 1.1)
- effectiveness-focus scale (M = 4.4, SD = 1.1)
- 49% of MTurkers had a mean score of 4+ on both scales
- 14% had a mean score of 5+ on both scales
- 3% had a mean score of 6+ on both scales

NYU students (n=96)
- expansive altruism (M = 4.1, SD = 1.1)
- effectiveness-focus (M = 4.3, SD = 1.1)
- 39% of NYU students had a mean score of 4+ on both scales
- 6% had a mean score of 5+ on both scales
- 2% had a mean score of 6+ on both scales

EAs (n=226): 
- expansive altruism (M = 5.6, SD = 0.9)
- effectiveness-focus (M = 6.0, SD = 0.8)
- 95% of effective altruist participants had a mean score of 4+ on both scales
- 81% had a mean score of 5+ on both scales
- 33% had a mean score of 6+ on both scales

——————————————————————————————————

VAEL RESULTS:

Vael personally:
- Expansive altruism:  4.2
- Effectiveness-focus:  6.3

Vael sample (Stanford freshman taking a class called “Preventing Human Extinction” in 2022, n=27 included, removed one for lack of engagement)
- expansive altruism (M = 4.2, SD = 1.0)
- effectiveness-focus (M = 4.3, SD = 1.0)
- 48% of Vael sample participants had a mean score of  4+ on both scales, 
- 4% had a mean score of 5+ on both scales,  
- 0% had a mean score of 6+ on both scales

——————————————————————————————————

Survey link is here: https://docs.google.com/forms/d/e/1FAIpQLSeY-cFioo7SLMDuHx1w4Rll6pwuRnenvjJOfi1z8WCNNwCBiA/viewform?usp=sf_link

Data is here: https://drive.google.com/file/d/1SFLH4bGC-j0nGuy315z_HH4LwdNAiusa/view?usp=sharing

And Excel apparently didn’t decide to save the formulas, gah. Formulas at the bottom are: =AVERAGE(K3:K29), =STDEV(K3:K29), =AVERAGE(R3:R29), =STDEV(R3:R29), =COUNTIF(V3:V29, TRUE)/COUNTA(V3:V29), =COUNTIF(W3:W29, TRUE)/COUNTA(W3:W29), =COUNTIF(X3:X29, TRUE)/COUNTA(X3:X29) and the other formulas are: =AND(K3>4,R3>4), =AND(K3>5,R3>5), =AND(K3>6,R3>6) dragged down through the rest of the columns

Apply for Stanford Existential Risks Initiative (SERI) Postdoc

It's super cool :). I think SERI's funded by a bunch of places (including some university funding, and for sure OpenPhil), but it definitely feels incredible! 

We need alternatives to Intro EA Fellowships

Just wanted to mention that if you were planning on standardizing an accelerated fellowship retreat, it seems definitely worth reaching out to CFAR folks (as mentioned), since they spent a lot of time testing models, including for post-workshop engagement, afaik! Happy to provide names / introductions if desired.

Vael Gates's Shortform

Update on my post "Seeking social science students / collaborators interested in AI existential risks" from ~1.5 months ago: 

I've been running a two-month "program" with eight of the students who reached out to me! We've come up with research questions from my original list, and the expectation is that individuals work 9h/week as volunteer research assistants. I've been meeting with each person / group for 30min per week to discuss progress. We're halfway through this experiment, with a variety of projects and progress states-- hopefully you'll see at least one EA Forum post up from those students! 

I was quite surprised by the interest that this post generated; ~30 people reached out to me, and a large number were willing to do a volunteer research for no credit / pay. I ended up working with eight students, mostly based on their willingness to work with me on some of my short-listed projects. I was willing to have their projects drift significantly from my original list if the students were enthusiastic and the project felt decently aligned with risks from long-term AI, and that did occur. My goal here was to get some experience training students who had limited research experience, and I've been enjoying working with them. 

I'm not sure about how likely it is I'll continue working with students past this 2-month program, because it does take up a chunk of time (that's made worse by trying to wrangle schedules), but I'm considering what to do for the future. If anyone's interested in also mentoring students with an interest in longterm risks from AI, please let me know, since I think there's interest! It's a decently low time commitment (30m/student or group of students) once you've got everything sorted. However, I am doing it for the benefit of the students, rather than with the expectation of getting help on my work, so it's more of a volunteer role. 

Seeking social science students / collaborators interested in AI existential risks

Update: I've been running a two-month "program" with eight of the students who reached out to me! We've come up with research questions from my original list, and the expectation is that individuals work 9h/week as volunteer RAs. I've been meeting with each person / group for 30min per week to discuss progress. We're halfway through this experiment, with a variety of projects and progress states-- hopefully you'll see at least one EA Forum post up from those students! 

--

I was quite surprised by the interest that this post generated; ~30 people reached out to me, and a large number were willing to do a volunteer research for no credit / pay. I ended up working with eight students, mostly based on their willingness to work with me on some of my short-listed projects. I was willing to have their projects drift significantly from my original list if the students were enthusiastic and the project felt decently aligned with risks from long-term AI, and that did occur. My goal here was to get some experience training students who had limited research experience, and I've been enjoying working with them. 

I'm not sure about how likely it is I'll continue working with students past this 2-month program, because it does take up a chunk of time (that's made worse by trying to wrangle schedules), but I'm considering what to do for the future. If anyone's interested in also mentoring students with an interest in longterm risks from AI, please let me know, since I think there's interest! It's a decently low time commitment (30m/student or group of students) once you've got everything sorted. However, I am doing it for the benefit of the students, rather than with the expectation of getting help on my work, so it's more of a volunteer role. 

Vael Gates's Shortform

I think classes are great given they're targeting something you want to learn, and you're not uncommonly self-motivated. They add a lot of structure and force engagement (i.e. homework, problem sets) in a way that's hard to find time / energy for by yourself. You also get a fair amount of guidance and scaffolding information, plus information presented in a pedagogical order!  With a lot of variance due to the skill and time investment of the instructor, size of class and quality of the curriculum etc. 

But if you DO happen to be very self-driven, know what you want to learn, and if in a research context if you're the type of person who is capable of generating novel insights without much guidance,  then heck yes classes are inefficient. Even if you're not all of these things, it certainly seems worth trying to see if you can be, since self-learning is so accessible and one learns a lot by being focusedly confused. I like how neatly presented the above deep dives idea is: it feels like it gives me enough structure to have a handle on it and makes it feel unusually feasible to do. 

But yeah, for the people who are best at deep dives, I imagine it's hard for any class to match, even with how high-variance classes can be :). 

Load More