Terrific overview! I'll offer some feedback with the hope that some of it may be helpful:
Big Picture Thoughts
Potentially useful points that I didn't see in the report:
Examples of questions/controversies that HLI could address:
I hope that some of this was helpful & I'm looking forward to seeing future reports!
I think the steelman of the neglectedness argument would be something like: "The less neglected something is, the less likely it is that we would be able to make them do it slightly better."
This is both because (a) it is harder to change the direction of the movement and (b) it is harder to genuinely find meaningful ways to improve the movement.
In (b), I wonder if there are some specific limitations of the current War-on-Drugs movement that would match the skills/interests of (some) EAs.
I'd be curious to learn more about the "types" of EAs that might be best-suited for this work, or how the "EA perspective" could enhance ongoing efforts.
As it stands, the case for scale (i.e., the magnitude of the problem) is very clear. However, I think scale is usually the strongest part of most cause area analyses (i.e., there are a lot of really big problems and it's usually not too difficult to articulate the bigness of those problems, especially using words rather than models). I think the role that EAs would play is less clear (as has been reflected in other comments relating to neglectedness). So, I wonder:
Are there some clear gaps or limitations in the current anti-War-on-drugs movement that could be filled by EA perspectives/skills? (As an example, one of the commentators emphasized that global efforts to legalize drugs may be neglected, and EAs who have skills/interests related to global advocacy might be especially helpful).
What a great opportunity! I wonder if people at SparkWave (e.g., Spencer Greenberg), Effective Thesis, or the Happier Lives Institute would have some ideas. All three organizations are aligned with EA and seem to be in the business of improving/applying/conducting social science research.
Also, I have no idea who your advisor is, but I think a lot of advisors would be open to having this kind of conversation (i.e., "Hey, there's this funding opportunity. We're not eligible for it, but I'm wondering if you have any advice..."). [Context: I'm a PhD student in psychology at UPenn.]
If that's not a good option, you could consider asking your advisor (and other academics you respect) if they know about any metascience/open science organizations that are highly effective [without mentioning anything about your relative and their interest in donating].
Finally, it's not clear to me if the donor is only interested in metascience or if they would also be open to funding "basic science" projects. "Basic science" is broad enough that I imagine it could open up a lot of alternative paths (many of which might be more explicitly EA-aligned than metascience). Examples include basic scientific research on effective giving, animal advocacy, mental health, AI safety, etc. Do you have a sense of how open to "basic science" your relative is, or was basic science just meant as a synonym for metascience?
Finally, good luck on this! :)
Super exciting work! Sharing a few quick thoughts:
1. I wonder if you've explored some of the reasons for effect size heterogeneity in ways that go beyond formal moderator analyses. In other words, I'd be curious if you have a "rough sense" of why some programs seem to be so much better than others. Is it just random chance? Study design factors? Or could it be that some CT programs are implemented much better than others, and there is a "real" difference between the best CT programs and the average CT programs?
This seems important because, in practice, donors are rarely deciding between funding the "average" CT program or the "average" [something else] program. Instead, they'd ideally want to choose between the "best" CT program to the "best" [something else] program. In other words, when I go to GiveWell, I don't want to know about the "average" Malaria program or the "average" CT program-- I want to know the best program for each category & how they compare to each other.
This might become even more important in analyses of other kinds of interventions, where the implementation factors might matter more. For instance, in the psychotherapy literature, I know a lot of people are cautious about making too many generalizations based on "average" effect sizes (which can be weighed down by studies that had poor training procedures, recruited populations that were unlikely to benefit, etc.).
With this in mind, what do you think is currently the "best" CT program, and how effective is it?
2. I'd be interested in seeing the measures that the studies used to measure life satisfaction, depression, and subjective well-being.
I'm especially interested in the measurement of life satisfaction. My impression is that the most commonly used life satisfaction measure (this one) might lead to an overestimation of the relationship between CTs and life satisfaction. I think two (of the five) the items could prime people to think more about their material conditions than their "happiness." Items listed below:
I have no data to suggest that this is true, so I'm very open to being wrong. Maybe these don't prime people toward thinking in material/economic terms at all. But if they do, I think they could inflate the effect size of CT programs on life satisfaction (relative to the effect size that would be found if we used a measure of life satisfaction that was less likely to prime people to think materialistically).
Also, a few minor things I noticed:
1. "The average effect size (Cohen’s d) of 38 CT studies on our composite outcome of MH and SWB is 0.10 standard deviations (SDs) (95% CI: 0.8, 0.13)."
I believe there might be a typo here-- was it supposed to be "0.08, 0.13"?
2. I believe there are two "Figure 5"s-- the forest plot should probably be Figure 6.
Best of luck with next steps-- looking forward to seeing analyses of other kinds of interventions!
What are the things you look for when hiring? What are some skills/experiences that you wish more EA applicants had? What separates the "top 5-10%" of EA applicants from the median applicant?
Thank you, Denise! I think this gives me a much better sense of some specific parts of the post that may be problematic. I still don't think this post, on balance, is particularly "bad" discourse (my judgment might be too affected by what I see on other online discussion platforms-- and maybe as I spend more time on the EA forum, I'll raise my standards!). Nonetheless, your comment helped me see where you're coming from.
I'll add that I appreciated that you explained why you downvoted, and it seems like a good norm to me. I think some of the downvotes might just be people who disagree with you. However, I also think some people may be reacting to the way you articulated your explanation. I'll explain what I mean below:
In the first comment, it seemed to me (and others) like you assumed Mark intentionally violated the norms. You also accused him of being unkind and uncurious without offering additional details.
In the second comment, you linked to the guidelines, but you didn't engage with Mark's claim ("I think this was kind and curious given the context."). This seemed a bit dismissive to me (akin to when people assume that a genuine disagreement is simply due to a lack of information/education on the part of the person they disagree with).
In the third comment (which I upvoted), you explained some specific parts of the post that you found excessively unkind/uncivil. This was the first comment where I started to understand why you downvoted this post.
To me, this might explain why your most recent post has received a lot of upvotes. In terms of "what to make of this," I hope you don't conclude "users should not explain why they downvote." Rather, I wonder if a conclusion like "users should explain why they downvote comments, and they should do so in ways that are kind & curious, ideally supported by specific examples when possible" would be accurate. Of course, the higher the bar to justify a downvote, the fewer people will do it, and I don't think we should always expect downvote-explainers to write up a thorough essay on why they're downvoting.
Finally, I'll briefly add that upvotes/downvotes are useful metrics, but I wouldn't place too much value in them. I'm guessing that upvotes/downvotes often correspond to "do I agree with this?" rather than "do I think this is a valuable contribution?" Even if your most recent comment had 99 downvotes, I would still find it helpful and appreciate it!
Thank you for this post, Mark! I appreciate that you included the graph, though I'm not sure how to interpret it. Do you mind explaining what the "recommendation impression advantage" is? (I'm sure you explain this in great detail in your paper, so feel free to ignore me or say "go read the paper" :D).
The main question that pops out for me is "advantage relative to what?" I imagine a lot of people would say "even if YouTube's algorithm is less likely to recommend [conspiracy videos/propaganda/fake news] than [traditional media/videos about cats], then it's still a problem! Any amount of recommending [bad stuff that is harmful/dangerous/inaccurate] should not be tolerated!"
What would you say to those people?
I read this post before I encountered this comment. I didn't recall seeing anything unkind or uncivil. I then re-read the post to see if I missed anything.
I still haven't been able to find anything problematic. In fact, I notice a few things that I really appreciate from Mark. Some of these include:
Overall, I found the piece to be thoughtfully written & in alignment with the community guidelines. I'm also relatively new to the forum, though, so please point out if I'm misinterpreting the guidelines.
I'll also add that I appreciate/support the guideline of "approaching disagreements with curiosity" and "aim to explain, not persuade." But I also think that it would be a mistake to overapply these. In some contexts, it makes sense for a writer to "aim to persuade" and approach a disagreement from the standpoint of expertise rather than curiosity.
Like any post, I'm sure this post could have been written in a way that was more kind/curious/community-normsy. But I'm struggling to see any areas in which this post falls short. I also think "over-correcting" could have harms (e.g., causing people to worry excessively about how to phrase things, deterring people from posting, reducing the clarity of posts, making writers feel like they have to pretend to be super curious when they're actually trying to persuade).
Denise, do you mind pointing out some parts of the post that violate the writing guidelines? (It's not your responsibility, of course, and I fully understand if you don't have time to articulate it. If you do, though, I think I'd find it helpful & it might help me understand the guidelines better.)
Thank you, Michael! I think this hypothetical is useful & makes the topic easier to discuss.
Short question: What do you mean by "user error?"
Longer version of the question:
Let's assume that I fill out weights for the various categories of desire (e.g., health, wealth, relationships) & my satisfaction in each of those areas.
Then, let's say you erase that experience from my mind, and then you ask me to rate my global life satisfaction.
Let's now assume there was a modest difference between the two ratings. It is not instinctively clear to me why I should prefer judgment #1 to judgment #2. That is, I think it's an open question whether the "desire-based life satisfaction judgment" or the "desire-free life satisfaction judgment" is the more "valid" response.
To me, "user error" could mean several things:
In other words, if we could eliminate these forms of user error, I would probably agree with you that this distinction is arbitrary. In practice, though, I think these "desire-based" and "desire-free" versions of life satisfaction ought to be considered distinct (albeit I'd expect them to be modestly correlated). I also don't think it's clear to me that the "desire-based" judgment should be considered better (i.e., more valid). And even if it should be considered better, I think I'd still want to know about the
Furthermore, when making decisions, I would probably want to see both judgments. For example, let's assume:
I would prefer Intervention C over intervention A, even though they both improve "desire-based satisfaction judgments" by the same amount. I also think reasonable people would disagree when comparing Intervention A to Intervention B.
For these reasons, I wonder if it's practically useful to consider "desire-based" and "desire-free" life satisfactions as separate constructs.