I really appreciate your and @Katja_Grace's thoughtful responses, and wish more of this discussion had made it into the manuscript. (This is a minor thing, but I also didn't love that the response rate/related concerns were introduced on page 20 [right?], since it's standard practice—at least in my area—to include a response rate up front, if not in the abstract.) I wish I had more time to respond to the many reasonable points you've raised, and will try to come back to this in the next few days if I do have time, but I've written up a few thoughts here.
Note that we didn't tell them the topic that specifically.
I understand that, and think this was the right call. But there seems to be consensus that in general, a response rate below ~70% introduces concerns of non-response bias, and when you're at 15%—with (imo) good reason to think there would be non-response bias—you really cannot rule this out. (Even basic stuff like: responders probably earn less money than non-responders, and are thus probably younger, work in academia rather than industry, etc.; responders are more likely to be familiar with the prior AI Impacts survey, and all that that entails; and so on.) In short, there is a reason many medical journals have a policy of not publishing surveys with response rates below 60%; e.g., JAMA asks for >60%, less prestigious JAMA journals also ask for >60%, and BMJ asks for >65%. (I cite medical journals because their policies are the ones I'm most familiar with, not because I think there's something special about medical journals.)
Tried sending them $100 last year and if anything it lowered the response rate.
I find it a bit hard to believe that this lowered response rates (was this statistically significant?), although I would buy that it didn't increase response rates much, since I think I remember reading that response rates fall off pretty quickly as compensation for survey respondents increases. I also appreciate that you're studying a high-earning group of experts, making it difficult to incentivize participation. That said, my reaction to this is: determine what the higher-order goals of this kind of project are, and adopt a methodology that aligns with that. I have a hard time believing that at this price point, conducting a survey with a 15% response rate is the optimal methodology.
If you are inclined to dismiss this based on your premise "many AI researchers just don’t seem too concerned about the risks posed by AI", I'm curious where you get that view from, and why you think it is a less biased source.
My impression stems from conversations I've had with two CS professor friends about how concerned the CS community is about the risks posed by AI. For instance, last week, I was discussing the last AI Impacts survey with a CS professor (who has conducted surveys, as have I); I was defending the survey, and they were criticizing it for reasons similar to those outlined above. They said something to the effect of: the AI Impacts survey results do not align with my impression of people's level of concern based on discussions I've had with friends and colleagues in the field. And I took that seriously, because this friend is EA-adjacent; extremely competent, careful, and trustworthy; and themselves sympathetic to concerns about AI risk. (I recognize I'm not giving you enough information for this to be at all worth updating on for you, but I'm just trying to give some context for my own skepticism, since you asked.)
Lastly, as someone immersed in the EA community myself, I think my bias is—if anything—in the direction of wanting to believe these results, but I just don't think I should update much based on a survey with such a low response rate.
I think this is going to be my last word on the issue, since I suspect we'd need to delve more deeply into the literature on non-response bias/response rates to progress this discussion, and I don't really have time to do that, but if you/others want to, I would definitely be eager to learn more.
I earn about $15/hour and donate much more than 1%. I don't think it's that hard to do this, and it seems weird to set such a low bar.
No, because the response rate wouldn't be 100%; even if it doubled to 30% (which I doubt it would), the cost would still be lower ($120k).
I appreciate that a ton of work went into this, and the results are interesting. That said, I am skeptical of the value of surveys with low response rates (in this case, 15%), especially when those surveys are likely subject to non-response bias, as I suspect this one is, given: (1) many AI researchers just don’t seem too concerned about the risks posed by AI, so may not have opened the survey and (2) those researchers would likely have answered the questions on the survey differently. (I do appreciate that the authors took steps to mitigate the risk of non-response bias at the survey level, and did not find evidence of this at the question level.)
I don’t find the “expert surveys tend to have low response rates” defense particularly compelling, given: (1) the loaded nature of the content of the survey (meaning bias is especially likely), (2) the fact that such a broad group of people were surveyed that it’s hard to imagine they’re all actually “experts” (let alone have relevant expertise), (3) the fact that expert surveys often do have higher response rates (26% is a lot higher than 15%), especially when you account for the fact that it’s extremely unlikely other large surveys are compensating participants anywhere close to this well, and (4) the possibility that many expert surveys just aren’t very useful.
Given the non-response bias issue, I am not inclined to update very much on what AI researchers in general think about AI risk on the basis of this survey. I recognize that the survey may have value independent of its knowledge value—for instance, I can see how other researchers citing these kinds of results (as I have!) may serve a useful rhetorical function, given readers of work that cites this work are unlikely to review the references closely. That said, I don’t think we should make a habit of citing work that has methodological issues simply because such results may be compelling to people who won’t dig into them.
Given my aforementioned concerns, I wonder whether the cost of this survey can be justified (am I calculating correctly that $138,000 was spent just compensating participants for taking this survey, and that doesn’t include other costs, like those associated with using the outside firm to compensate participants, researchers’ time, etc?). In light of my concerns about cost and non-response bias, I am wondering whether a better approach would instead be to randomly sample a subset of potential respondents (say, 4,000 people), and offer to compensate them at a much higher rate (e.g., $100), given this strategy could both reduce costs and improve response rates.
I like the general idea, but a bit of feedback:
I'm not super compelled by the arguments for only asking people to donate 1%, which strikes me as a trivial amount of money, especially in the context of (1) doctors' and other healthcare workers' salaries (in the US, physicians make $350k on average) and (2) the fact that 2/3 of the US population gives an average of 4% per year (I dunno how reliable this stat is, or how this rate compares to other countries, but I'm inclined to think that the 1/3 who don't give will not respond to this initiative).
I understand not wanting to make the perfect the enemy of the good here, but I think the biggest risk of only asking people to donate 1% is inadvertently normalizing 1% as being a reasonable amount for healthcare workers to donate (I am still a trainee, and I comfortably donate way more than this! I also am a bit reticent to take the pledge myself, because I don't want people to think that I only donate 1%/endorse donating 1%).
I think it makes most sense to target the healthcare workers who already donate some (and who on average likely donate >1% already), and I suspect the best thing to do would be to focus on (1) getting them to pledge a more significant amount than they're already donating (5%?), (2) specifically encouraging them to give to charities that are supported by a lot of evidence (e.g., GiveWell's Top Charities fund, rather than GiveWell's All Grants Fund), and (3) focusing on health-oriented interventions (which most of GiveWell's are already, but this would be good to highlight explicitly).
“double-voting would surge as people learned you get a freebie.”
I just don’t see this happening?
Separately, one objection I have to cracking down hard on self-voting is that I think this is not very harmful relative to other ways in which people don’t vote how they’re “supposed to.” E.g., we know the correlation between upvotes and agree votes is incredibly high, and downvoting something solely because you disagree with it strikes me as more harmful to discourse on the forum than self-voting. I think the reason self-voting gets highlighted isn’t because it’s especially harmful, it’s just because it’s especially catchable.
If the mods want to improve people’s voting behavior on the forum, I both wish they’d target different voting behavior (ie, the agree/upvoting correlation) and use different means to do it (ie, generating reports for people of their own voting correlations, whether they tend to upvote/downvote certain people, etc), rather than naming/shaming people for self-voting.
I feel like this is getting really complicated and ultimately my point is very simple: prevent harmful behavior via the least harmful means. If you can get people to not vote for themselves by telling them not to, then just… do that. I have a really hard time imagining that someone who was warned about this would continue to do it; if they did, it would be reasonable to escalate. But if they’re warned and then change their behavior, why do I need to know this happened? I just don’t buy that it reflects some fundamental lack of integrity that we all need to know about (or something like this).
This is just a weird way to think about evidence, imo. I think the original post would’ve been more useful and persuasive (and generated better discourse) if it had been 1/5th as long. Throwing evidence—even high-quality evidence—at people does not always make them reason better, and often makes them reason worse. (I also don’t think it works here to say “just have better epistemics!” because (a) one important sense in which we’re all boundedly rational is that our ability to process information well decreases as the volume of information increases and (b) a writer acting in good faith—who wants you to reach the right conclusions—should account for this in how they present information.)
Critically, as previously stated, I think the photos constitute particularly poor evidence—they have a very low “provides useful information:how likely are they to sway people in ways that are irrational” ratio. This is why my comment wasn’t just “shorten your post so people can understand it better,” but rather “I think these photos will lead to vibes-based reasoning.” (This is also why prosecutors etc etc use this kind of evidence; it’s meant to make the jury think “aw they look so happy together! He couldn’t have possibly done that,” when in reality, the photo of the smiling couple on vacation has ~0 bearing on whether he murdered her.)
Thanks, this is also helpful! One thing to think about (and no need to tell me), is whether making the checks public could effectively disincentivize the bad behavior (like how warnings about speed cameras may as effectively disincentivize speeding as the cameras do themselves). But if there are easy workarounds, I can see why this wouldn’t be viable.