I think it would be interesting to have various groups (e.g. EAs who are skeptical vs worried about AI risk) rank these arguments and see how their lists of the top ones compare.
The strength of the arguments is very mixed as you say. If you wanted to find good arguments, I think it might have been better to focus on people with more exposure to the arguments.
But knowing more about where a diverse set of EAs is at in terms of persuasion is good too, especially for AI safety community builders.
This solidifies a conclusion for me: when talking about AI risk, the best/most rigorous resources aren't the ones which are most widely shared/recommended (rigorous resources are e.g. Ajeya Cotra's report on AI timelines, Carlsmith's report on power-seeking AI, Superintelligence by Bostrom or (to a lesser extent) Human Compatible by Russell).
Those might still not be satisfying to skeptics, but are probably more satisfying than " short stories by Eliezer Yudkowsky" (though one can take an alternative angle: skeptics wouldn't bother reading a >100 page report, and I think the complaint that it's all short stories by Yudkowsky comes from the fact that that's what people actually read).
Additionally, there appears to be a perception that AI safety research is limited to MIRI & related organisations, which definitely doesn't reflect the state of the field—but from the outside this multipolarity might be hard to discover (outgroup-ish homogeneity bias strikes again).
Personally I find Human Compatible the best resource of the ones you mentioned. If it were just the others I'd be less bought into taking AI risk seriously.
I agree that it occupies a spot on the layperson-understandability/rigor Pareto-frontier, but that frontier is large and the other things I mentioned are at other points.
Indeed. It just felt more grounded in reality to me than the other resources which may appeal more to us laypeople and the non laypeople prefer more speculative and abstract material.
Seconded/thirded on Human Compatible being near that frontier. I did find its ending 'overly optimistic' in the sense of framing it like 'but lo, there is a solution!' while other similar resources like Superintelligence and especially The Alignment Problem seem more nuanced in presenting uncertain proposals for paths forward not as oven-ready but preliminary and speculative.
I'm not quite sure I read the first two paragraphs correctly. Are you saying that Cotra, Carlsmith and Bostrom are the best resources but they are not widely recommended? And people mostly read short posts, like those by Eliezer, and those are accessible but might not have the right angle for skeptics?
Yes, I think that's a fair assessment of what I was saying.
Maybe I should have said that they're not widely recommended enough on the margin, and that there are surely many other good & rigorous-ish explanations of the problem out there.
I'm also always disappointed when I meet EAs who aren't deep into AI safety but curious, and the only things they have read is the List of Lethalities & the Death with Dignity post :-/ (which are maybe true but definitely not good introductions to the state of the field!)
As a friendly suggestion, I think the first paragraph of your original comment would be less confusing if the parenthetical clause immediately followed "the best/most rigorous resources". This would make it clear to the reader that Cotra, Carlsmith, et al are offered as examples of best/most rigorous resources, rather than as examples of resources that are widely shared/recommended.
There are short stories by Yudkowsky? All I ever encountered were thousands-of-pages-long sequences of blog posts (which I hence did not read, as you suggest).
If you're unconvinced about AI danger and you tell me what specifically are your cruxes, I might be able to connect you with Yudkowskian short stories that address your concerns.
I think I would have found Ajeya's cold takes guest post on "Why AI alignment could be hard with modern deep learning" persuasive back when I was skeptical. It is pretty short. I think the reason why I didn't find what you call "short stories by Eliezer Yudkowsky" persuasive was because they tended to not use concepts / terms from ML. I guess even stuff like orthogonality thesis and instrumental convergence thesis was not that convincing to me on a gut level even though I didn't disagree with the actual argument for them because I had the intuition that whether misaligned AI was a big deal depended on details of how ML actually worked, which I didn't know. To me back then it looked like most people I knew with much more knowledge of ML were not concerned about AI x-risk so probably it wasn't a big deal.
Thanks! I thought this was great. I really like the goals of fostering a more in-depth discussion and understanding skeptics' viewpoints.
I'm not sure about modeling a follow-up project on Skeptical Science, which is intended (in large part) to rebut misinformation about climate change. There's essentially consensus in the scientific community that human beings are causing climate change, so such a project seems appropriate.
Is there an equally high level of expert consensus on the existential risks posed by AI?
Have all of the strongest of the AI safety skeptics' arguments been thoroughly debunked using evidence, logic, and reason?
If the answer to either of these questions is "no," then maybe more foundational work (in the vein of this interview project) should be done first. I like your idea of using double crux interviews to determine which arguments are the most important.
One other idea would be to invite some prominent skeptics and proponents to synthesize the best of their arguments and debate them, live or in writing, with an emphasis on clear, jargon-free language (maybe such a project already exists?).
Is there an equally high level of expert consensus on the existential risks posed by AI?
There isn't. I think a strange but true and important fact about the problem is that it just isn't a field of study in the same way e.g. climate science is — as argued in this Cold Takes post. So it's unclear who the relevant "experts" should be. Technical AI researchers are maybe the best choice, but they're still not a good one; they're in the business of making progress locally, not forecasting what progress will be globally and what effects that will have.
Thanks! I agree - AI risk is at a much earlier stage of development as a field. Even as the field develops and experts can be identified, I would not expect a very high degree of consensus. Expert consensus is more achievable for existential risks such as climate science and asteroid impacts that can be mathematically modeled with high historical accuracy - there's less to dispute on empirical / logical grounds.
A campaign to educate skeptics seems appropriate for a mature field with high consensus, whereas constructively engaging skeptics supports the advancement of a nascent field with low consensus.
One other idea would be to invite some prominent skeptics and proponents to synthesize the best of their arguments and debate them, live or in writing, with an emphasis on clear, jargon-free language (maybe such a project already exists?).
I have been suggesting this (and other uses of Kialo) for a while, although perhaps not as frequently or forcefully as I ought to…
I( would recommend linking to the site, btw)
Do you have a sense of which argument(s) were most prevalent and which were most frequently the interviewees crux?
It would also be useful to get a sense of which arguments are only common among those with minimal ML/safety engagement. If basic AI safety engagement reduces the appeal of a certain argument, then there's little need for further work on messaging in that area.
Do you think the wording "Have you heard about the concept of existential risk from Advanced AI? Do you think the risk is small or negligible, and that advanced AI safety concerns are overblown? " might have biased your sample in some way?
E.g. I can imagine people who are very worried about alignment but don't think current approaches are tractable.
In case "I can imagine" was literal, then let me serve as proof-of-concept, as a person who thinks the risk is high but there's nothing we can do about it short of a major upheaval of the culture of the entire developed world.
The sample is biased in many ways: Because of the places where I recruited, interviews that didn't work out because of timezone difference, people who responded too late, etc. I also started recruiting on Reddit and then dropped that in favour of Facebook.
So this should not be used as a representative sample, rather it's an attempt to get a wide variety of arguments.
I did interview some people who are worried about alignment but don't think current approaches are tractable. And quite a few people who are worried about alignment but don't think it should get more resources.
Referring to my two basic questions listed at the top of the post, I had a lot of people say "yes" to (1). So they are worried about alignment. I originally planned to provide statistics on agreement / disagreement on questions 1/2 but it turned out that it's not possible to make a clear distinction between the two questions - most people, when discussing (2) in detail, kept referring back to (1) in complex ways.
Once again, I’ll say that a study which analyzed the persuasion psychology/sociology of “x-risk from AI” (e.g., what lines of argument are most persuasive to what audiences, what’s the “minimal distance / max speed” people are willing to go from “what is AI risk” to “AI risk is persuasive,” how important is expert statements vs. theoretical arguments, what is the role of fiction in magnifying or undermining AI x-risk fears) seems like it would be quite valuable.
Although I’ve never held important roles or tried to persuade important people, in my conversations with peers I have found it difficult to walk the line between “sounding obsessed with AI x-risk” and “under emphasizing the risk,” because I just don’t have a good sense of how fast I can go from someone being unsure of whether AGI/superintelligence is even possible to “AI x-risk is >10% this century.”
Just added a link to the "A-Team is already working on this" section of this post to my "(Even) More EAs Should Try AI Safety Technical Research," where I observe that people who disagree with basically every other claim in this post still don't work on AI safety because of this (flawed) perception.
Did any of these arguments change your beliefs about AGI? I'd always love to get a definition of General Intelligence since that seems like an ill posed concept.
I think it would be interesting to have various groups (e.g. EAs who are skeptical vs worried about AI risk) rank these arguments and see how their lists of the top ones compare.
Nice quality user research!
Consider adding a TL;DR including your calls to action - looking for collaborators and ideas for future projects, which I think will interest people
Thanks for doing this!
The strength of the arguments is very mixed as you say. If you wanted to find good arguments, I think it might have been better to focus on people with more exposure to the arguments. But knowing more about where a diverse set of EAs is at in terms of persuasion is good too, especially for AI safety community builders.
This solidifies a conclusion for me: when talking about AI risk, the best/most rigorous resources aren't the ones which are most widely shared/recommended (rigorous resources are e.g. Ajeya Cotra's report on AI timelines, Carlsmith's report on power-seeking AI, Superintelligence by Bostrom or (to a lesser extent) Human Compatible by Russell).
Those might still not be satisfying to skeptics, but are probably more satisfying than " short stories by Eliezer Yudkowsky" (though one can take an alternative angle: skeptics wouldn't bother reading a >100 page report, and I think the complaint that it's all short stories by Yudkowsky comes from the fact that that's what people actually read).
Additionally, there appears to be a perception that AI safety research is limited to MIRI & related organisations, which definitely doesn't reflect the state of the field—but from the outside this multipolarity might be hard to discover (outgroup-ish homogeneity bias strikes again).
Personally I find Human Compatible the best resource of the ones you mentioned. If it were just the others I'd be less bought into taking AI risk seriously.
I agree that it occupies a spot on the layperson-understandability/rigor Pareto-frontier, but that frontier is large and the other things I mentioned are at other points.
Indeed. It just felt more grounded in reality to me than the other resources which may appeal more to us laypeople and the non laypeople prefer more speculative and abstract material.
Seconded/thirded on Human Compatible being near that frontier. I did find its ending 'overly optimistic' in the sense of framing it like 'but lo, there is a solution!' while other similar resources like Superintelligence and especially The Alignment Problem seem more nuanced in presenting uncertain proposals for paths forward not as oven-ready but preliminary and speculative.
I'm not quite sure I read the first two paragraphs correctly. Are you saying that Cotra, Carlsmith and Bostrom are the best resources but they are not widely recommended? And people mostly read short posts, like those by Eliezer, and those are accessible but might not have the right angle for skeptics?
Yes, I think that's a fair assessment of what I was saying.
Maybe I should have said that they're not widely recommended enough on the margin, and that there are surely many other good & rigorous-ish explanations of the problem out there.
I'm also always disappointed when I meet EAs who aren't deep into AI safety but curious, and the only things they have read is the List of Lethalities & the Death with Dignity post :-/ (which are maybe true but definitely not good introductions to the state of the field!)
As a friendly suggestion, I think the first paragraph of your original comment would be less confusing if the parenthetical clause immediately followed "the best/most rigorous resources". This would make it clear to the reader that Cotra, Carlsmith, et al are offered as examples of best/most rigorous resources, rather than as examples of resources that are widely shared/recommended.
Thanks, will edit.
There are short stories by Yudkowsky? All I ever encountered were thousands-of-pages-long sequences of blog posts (which I hence did not read, as you suggest).
Lots of it is here
If you're unconvinced about AI danger and you tell me what specifically are your cruxes, I might be able to connect you with Yudkowskian short stories that address your concerns.
The ones which come immediately to mind are:
That Alien Message
Sorting Pebbles Into Correct Heaps
I think I would have found Ajeya's cold takes guest post on "Why AI alignment could be hard with modern deep learning" persuasive back when I was skeptical. It is pretty short. I think the reason why I didn't find what you call "short stories by Eliezer Yudkowsky" persuasive was because they tended to not use concepts / terms from ML. I guess even stuff like orthogonality thesis and instrumental convergence thesis was not that convincing to me on a gut level even though I didn't disagree with the actual argument for them because I had the intuition that whether misaligned AI was a big deal depended on details of how ML actually worked, which I didn't know. To me back then it looked like most people I knew with much more knowledge of ML were not concerned about AI x-risk so probably it wasn't a big deal.
Thanks! I thought this was great. I really like the goals of fostering a more in-depth discussion and understanding skeptics' viewpoints.
I'm not sure about modeling a follow-up project on Skeptical Science, which is intended (in large part) to rebut misinformation about climate change. There's essentially consensus in the scientific community that human beings are causing climate change, so such a project seems appropriate.
If the answer to either of these questions is "no," then maybe more foundational work (in the vein of this interview project) should be done first. I like your idea of using double crux interviews to determine which arguments are the most important.
One other idea would be to invite some prominent skeptics and proponents to synthesize the best of their arguments and debate them, live or in writing, with an emphasis on clear, jargon-free language (maybe such a project already exists?).
There isn't. I think a strange but true and important fact about the problem is that it just isn't a field of study in the same way e.g. climate science is — as argued in this Cold Takes post. So it's unclear who the relevant "experts" should be. Technical AI researchers are maybe the best choice, but they're still not a good one; they're in the business of making progress locally, not forecasting what progress will be globally and what effects that will have.
Thanks! I agree - AI risk is at a much earlier stage of development as a field. Even as the field develops and experts can be identified, I would not expect a very high degree of consensus. Expert consensus is more achievable for existential risks such as climate science and asteroid impacts that can be mathematically modeled with high historical accuracy - there's less to dispute on empirical / logical grounds.
A campaign to educate skeptics seems appropriate for a mature field with high consensus, whereas constructively engaging skeptics supports the advancement of a nascent field with low consensus.
This is a pretty good idea!
We could use kialo, a web app, to map those points and their counterarguments
I can organize a session with my AI safety novice group to build the kialo
I have been suggesting this (and other uses of Kialo) for a while, although perhaps not as frequently or forcefully as I ought to… I( would recommend linking to the site, btw)
Do you have a sense of which argument(s) were most prevalent and which were most frequently the interviewees crux?
It would also be useful to get a sense of which arguments are only common among those with minimal ML/safety engagement. If basic AI safety engagement reduces the appeal of a certain argument, then there's little need for further work on messaging in that area.
Do you think the wording "Have you heard about the concept of existential risk from Advanced AI? Do you think the risk is small or negligible, and that advanced AI safety concerns are overblown? " might have biased your sample in some way?
E.g. I can imagine people who are very worried about alignment but don't think current approaches are tractable.
In case "I can imagine" was literal, then let me serve as proof-of-concept, as a person who thinks the risk is high but there's nothing we can do about it short of a major upheaval of the culture of the entire developed world.
The sample is biased in many ways: Because of the places where I recruited, interviews that didn't work out because of timezone difference, people who responded too late, etc. I also started recruiting on Reddit and then dropped that in favour of Facebook.
So this should not be used as a representative sample, rather it's an attempt to get a wide variety of arguments.
I did interview some people who are worried about alignment but don't think current approaches are tractable. And quite a few people who are worried about alignment but don't think it should get more resources.
Referring to my two basic questions listed at the top of the post, I had a lot of people say "yes" to (1). So they are worried about alignment. I originally planned to provide statistics on agreement / disagreement on questions 1/2 but it turned out that it's not possible to make a clear distinction between the two questions - most people, when discussing (2) in detail, kept referring back to (1) in complex ways.
Once again, I’ll say that a study which analyzed the persuasion psychology/sociology of “x-risk from AI” (e.g., what lines of argument are most persuasive to what audiences, what’s the “minimal distance / max speed” people are willing to go from “what is AI risk” to “AI risk is persuasive,” how important is expert statements vs. theoretical arguments, what is the role of fiction in magnifying or undermining AI x-risk fears) seems like it would be quite valuable.
Although I’ve never held important roles or tried to persuade important people, in my conversations with peers I have found it difficult to walk the line between “sounding obsessed with AI x-risk” and “under emphasizing the risk,” because I just don’t have a good sense of how fast I can go from someone being unsure of whether AGI/superintelligence is even possible to “AI x-risk is >10% this century.”
Just added a link to the "A-Team is already working on this" section of this post to my "(Even) More EAs Should Try AI Safety Technical Research," where I observe that people who disagree with basically every other claim in this post still don't work on AI safety because of this (flawed) perception.
I'd be very interested to see how many of them have changed their minds now.
Did any of these arguments change your beliefs about AGI? I'd always love to get a definition of General Intelligence since that seems like an ill posed concept.