I am an attorney in a public-sector position not associated with EA, although I cannot provide legal advice to anyone. My involvement with EA so far has been mostly limited so far to writing checks to GiveWell and other effective charities in the Global Health space, as well as some independent reading. I have occasionally read the forum and was looking for ideas for year-end giving when the whole FTX business exploded . . .
As someone who isn't deep in EA culture (at least at the time of writing), I may be able to offer a perspective on how the broader group of people with sympathies toward EA ideas might react to certain things. I'll probably make some errors that would be obvious to other people, but sometimes a fresh set of eyes can help bring a different perspective.
Agree that there seem to be some strawmen in HLI's response:
We don’t believe that the entire field of LMIC psychotherapy should be considered bunk, compromised, or uninformative.
Has anyone suggested that the "entire field of LMIC psychotherapy" is "bunk"?
If one insisted only on using charity evaluations that had every choice pre-registered, there would be none to choose from.
Has anyone suggested that, either? As I understand it, it's typical to look at debatable choices that happen to support the author's position with a somewhat more skeptical lens if they haven't been pre-registered. I don't think anyone has claimed lack of certain choices being pre-registered is somehow fatal, only a factor to consider.
Epistemic status: tentative, it's been a long time since reading social science papers was a significant part of my life. Happy to edit/retract this preliminary view as appropriate if someone is able to identify mistakes.
Excluding outliers is thought sensible practice here; two related meta-analyses, Cuijpers et al., 2020c; Tong et al., 2023, used a similar approach.
I can't access Cuijpers et al., but I don't read Tong et al. as supporting what HLI has done here.
In their article, Tong et al. provide the effect size with no exclusions, then with outliers excluded, then with "extreme outliers" excluded (the latter of which seems to track HLI's removal criterion). They also provide effect size with various publication-bias measures employed. See PDF at 5-6. If I'm not mistaken, the publication bias measures are applied to the no-exclusions version, not a version with outliers removed or limited to those with lower RoB. See id. at 6 tbl.2 (n = 117 for combined and 2 of 3 publication-bias effect sizes; 153 with trim-and-fill adding 36 studies; n = 74 for outliers removed & n = 104 for extreme outliers removed; effect sizes after publication-bias measures range from 0.42 to 0.60 seem to be those mentioned in HLI's footnote above).
Tong et al. "conducted sensitive analyses comparing the results with and without the inclusion of extreme outliers," PDF at 5, discussing the results without exclusion first and then the results with exclusion. See id. at 5-6. Tables 3-5 are based on data without exclusion of extreme outliers; the versions of Tables 4 and 5 that excludes extreme outliers are relegated to the supplemental tables (not in PDF). See id. at 6. This reads to my eyes as treating both the all-inclusive and extreme-outliers-excluded data seriously, with some pride of place to the all-inclusive data.
I don't read Tong et al. as having reached a conclusion that either the all-inclusive or extreme-outliers-excluded results were more authoritative, saying things like:
Lastly, we were unable to explain the different findings in the analyses with vs. without extreme outliers. The full analyses that included extreme outliers may reflect the true differences in study characteristics, or they may imply the methodological issues raised by studies with effect sizes that were significantly higher than expected.
and
Therefore, the larger treatment effects observed in non-Western trials may not necessarily imply superior treatment outcomes. On the other hand, it could stem from variations in study design and quality.
and
Further research is required to explain the reasons for the differences in study design
and quality between Western and non-Western trials, as well as the different results in the analyses with and without extreme outliers.
PDF at 10.
Of course, "further research needed" is an almost inevitable conclusion of the majority of academic papers, and Tong et al. have the luxury of not needing to reach any conclusions to inform the recommended distribution of charitable dollars. But I don't read the article by Tong et al. as supporting the proposition that it is appropriate to just run with the outliers-excluded data. Rather, I read the article as suggesting that -- at least in the absence of compelling reasons to the contrary -- one should take both analyses seriously, but neither definitively.
I lack confidence in what taking both analyses seriously, but neither definitively would mean for purposes of conducting a cost-effectiveness analysis. But I speculate that it would likely involve some sort of weighting of the two views.
Agree re: the likelihood of a second order awareness / social proof impact. An absence of clustering would make me think the second-order effect was the predominant one, which would be useful info for designing rewards for future fundraisers.
What would people think of adding SummaryBot functionality to some very long comments? The emergence of a new HLI/SM thread reminds me that some comments are post-length and post-complexity contributions; many of them would benefit from a summary. That can be particularly valuable where the original poster and commenter start a dialogue, with long replies to each other's comments. Those threads can take a significant time commitment to get through!
Unsure what the cutoff should be to trigger a comment summary -- maybe 500-600 words?
I think I can diagnose the underlying problem: Bayesian methods are very sensitive to the stipulated prior. In this case, the prior is likely too high, and definitely too narrow/overconfident.
Would it have been better to start with a stipulated prior based on evidence of short-course general-purpose[1] psychotherapy's effect size generally, update that prior based on the LMIC data, and then update that on charity-specific data?
One of the objections to HLI's earlier analysis was that it was just implausible in light of what we know of psychotherapy's effectiveness more generally. I don't know that literature well at all, so I don't know how well the effect size in the new stipulated prior compares to the effect size for short-course general-purpose psychotherapy generally. However, given the methodological challenges with measuring effect size in LMICs on available data, it seems like a more general understanding of the effect size should factor into the informed prior somehow. Of course, the LMIC context is considerably different than the context in which most psychotherapy studies have been done, but I am guessing it would be easier to manage quality-control issues with the much broader research base available. So both knowledge bases would likely inform my prior before turning to charity-specific evidence.
[Edit 6-Dec-23: Greg's response to the remainder of this comment is much better than my musings below. I'd suggest reading that instead!]
To my not-very-well-trained eyes, one hint to me that there's an issue with application of Bayesian analysis here is the failure of the LMIC effect-size model to come anywhere close to predicting the effect size suggested by the SM-specific evidence. If the model were sound, it would seem very unlikely that the first organization evaluated to the medium-to-in-depth level would happen to have charity-specific evidence suggesting an effect size that diverged so strongly from what the model predicted. I think most of us, when faced with such a circumstance, would question whether the model was sound and would put it on the shelf until performing other charity-specific evaluations at the medium-to-in-depth level. That would be particularly true to the extent the model's output depended significantly on the methodology used to clean up some problems with the data.[2]
By which I mean not psychotherapy for certain narrow problems (e.g., CBT-I for insomnia, exposure therapy for phobias).
If Greg's analysis is correct, it seems I shouldn't assign the informed prior much more credence than I have credence in HLI's decision to remove outliers (and to a lesser extent, its choice of a method). So, again to my layperson way of thinking, one partial way of thinking about the crux could be that the reader must assess their confidence in HLI's outlier-treatment decision vs. their confidence in the Baird/Ozler RCT on SM.
I thought this recent study in JAMA Open on vegan nutrition was worth a quick take due to its clever and legible study design:
https://jamanetwork.com/journals/jamanetworkopen/fullarticle/2812392
This was an identical twin study in which one twin went vegan for eight weeks, and the other didn't. Nice results on some cardiometabolic lab values (e.g., LDL-C) even though the non-vegan twin was also upping their game nutritionally. I don't think the fact that vegan diets generally improve cardiometabolic health is exactly fresh news, but I find the study design to be unusually legible for nutritional research.
Agreed on it not being the highest of bars. I felt there was a big gap between your (2) and (3), so was aiming at ~ 2.4 to 2.5: neither peripheral nor widespread, with the understanding that the implied scale is somewhat exponential (because 3 is much worse than 2).
What probability would you assign to some weakened version of (3) being true? By some weakened version, I roughly mean taking the way out of "way too many," and defining too many as ~ meaningfully above the base rate for people in positions of power/influence.
The FTX scandal was horrible for its victims, and immoral period, but it is not effective altruism. Just because someone is involved in the EA community, and identifies as EA, doesn't mean they do effective altruism.
That's true, but in totaling up the benefits and costs of EA, we have to consider it. I think the test is roughly whether the wrongdoer was success in using EA to facilitate their bad deeds, or was materially motivated by their exposure to EA. I think the answer for FTX was yes -- SBF obtained early funding from EA-aligned sources, as well as attracting critical early recruits (some of them turning into co-conspirators) and gaining valuable PR benefits from his association with EA. (One could argue that he was also motivated by EA, but I'm not confident that I believe that.)
In the same way, I would weigh mistreatment of children facilitated by association with the Catholic Church or Scouting in assessing those movements, even those child abuse is not the practice of Catholicism or of Scouting. In contrast, I wouldn't generally view the unfacilitated/unmotivated acts -- good or bad -- of effective altruists, Catholics, or Scout volunteers in assessing the movements to which they belong.
This makes a lot of sense generally, but I see one issue that seems potentially significant.
I have a fairly good understanding of what will happen to more cause-area-specific yet "meta" grants in the x-risk/longtermism and animal-welfare domains. The view that the LTFF and AWF are better suited to funding these opportunities seems fairly compelling. The issue I see is that the EA Funds' Global Health and Development Fund (GHDF) seems to have focused on larger grants to more established organizations; this makes sense given its strong connection to GiveWell's work. That doesn't feel like a good fit for opportunities like the ones described by (4) and (5) of your examples of out-of-scope projects. According to its website, GHDF isn't even accepting applications. Thus, while these sorts of projects are not formally outside of GHDF's scope -- e.g., it has granted to One for the World -- it seems that they may be inaccessible as a practical matter.[1]
Perhaps the ideal solution would be for GHDF to start taking applications that would previously have been within EAIF's scope, so that there is a relatively seamless transition for potential and established grantees. I'm not sure if that is practicable for GHDF, though?
A second possibility would be for EAIF to retain the global-health/development scope for a stated time period, but (for donations received after a specified date in 2024) only out of donor funds that have been designated for that specific scope. That would allow more clarity of scope for EAIF donors while providing a conduit for donors who feel strongly about global-health/development meta work.
Finally, the exit strategy could be slowed down for global health/development specifically, in recognition of the lack of an obvious alternative fund for these sorts of grants. Although exit grants would soften existing grantees' landing for projects receiving ongoing support, it seems plausible that potential grantees may have done significant groundwork for new projects or expansions based on the funding universe as it existed prior to this plan being made public. Moreover, even if one expects other grantmakers would eventually step in to fill the void, this would likely take time. Thus, 1Q 2024 may be too soon for phasing out the current version of EAIF, at least where global health/development meta activity is concerned.
I'm aware of Open Phil's work in global health/wellbeing community building, but as you note one of the objectives here is to move toward "a more community-funded model, in contrast to . . . previous reliance on significant institutional donors like Open Phil." A plan in which Open Phil picks up responsibility for funding these sorts of grants in global health/wellbeing seems like a step backward from this objective.