I am an attorney in a public-sector position not associated with EA, although I cannot provide legal advice to anyone. My involvement with EA so far has been mostly limited so far to writing checks to GiveWell and other effective charities in the Global Health space, as well as some independent reading. I have occasionally read the forum and was looking for ideas for year-end giving when the whole FTX business exploded . . .
As someone who isn't deep in EA culture (at least at the time of writing), I may be able to offer a perspective on how the broader group of people with sympathies toward EA ideas might react to certain things. I'll probably make some errors that would be obvious to other people, but sometimes a fresh set of eyes can help bring a different perspective.
I think it has potential!
Finally, I think the two approaches require very different sets of skills. My guess is that there are many more people in the EA community today (which skews young and quantitatively-inclined) with skills that are a good fit for evaluation-and-support than have skills that are an equally good fit for design-and-execution. I worry that this skills gap might increase the risk that people in the EA community might accidentally cause harm while attempting the design-and-execution approach.
This paragraph is a critical component of the argument as presently stated. However, I don't see much more than a mere assertion that (1) certain skills are generally missing that are needed for design-and-execution (D&E) and (2) the absence of those skills increases the risk of accidental harm. In a full post, I would explain this more.
My own intuition is that a larger driver for increased harm in D&E models (vs. evaluation-and-support, E&S) may be inherent to working in a novel and neglected subject area like AI safety. In an E&S model, the startup efforts incubated independently of EA are more likely to be pretty small-scale. Even if a number of them end up being net-harmful, the risk is limited by how small they are. But in a D&E model, EA resources may be poured into an organization earlier in its life cycle, increasing the risk of significant harm it it turns out the organization was ultimately not well-conceived.
As far as mitigations, I think a presumption toward "start small, go slow" in a underdeveloped cause area for which a heavily D&E approach is necessary might be appropriate in many cases for the reason described in the paragraph above. E.g., in some cases, the objective should be to develop the ecosystem in that cause area where heavy work can begin in 7-10 years, vs. pouring in a ton of resources early and trying to get results ASAP. I think I'd like to see more ideas like than in a full post, as the suggestion to develop better "risk-management or error-correction capabilities" (while correct in my view) is also rather abstract.
And as a general rule most people try to avoid making enemies out of people who they perceive to have lots of power/influence when possible. So their threat model doesn't necessarily have to be terribly well-defined to be effective at accomplishing the powerful/influential person's objective.
Where's the evidence that, e.g., everyone "act[s] as if a couple of beehives or shrimp farms are as important as a human city"?So someone wrote a speculative report about bee welfare ranges . . . if "everyone" accepted that "1 human is worth 14 bees" -- or even anything close to that -- the funding and staffing pictures in EA would look very, very different. How many EAs are working in bee welfare, and how much is being spent in that area?
As I understand the data, EA resources in GH&D are pretty overwhelmingly in life-saving interventions like AMF, suggesting that the bulk of EA does not agree with HLI at present. I'm not as well versed in farmed animal welfare, but I'm pretty sure no one in that field is fundraising for interventions costing anywhere remotely near hundreds of dollars to save a bee and claiming they are effective.
In the end, reasoning transparency by charity evaluators helps the donor better make an informed moral choice. Carefully reading analyses from various sources helps me (and other donors) make choices that are consistent with our own values. EA is well ahead of most charitable movements by explicitly acknowledging that trade-offs exist and at least attempting to reason about them. One can (and should) decline to donate where the charity's treatment of tradeoffs isnt convincing. As I've stated elsewhere on this post, I'm sticking with GiveWell-style interventions at least for now.
We both think the ratio of parental grief WELLBYs to therapy WELLBYs is likely off, although that doesn't tell us which number is wrong. Given that your argument is that an implausible ratio should tip HLI off that there's a problem, the analysis below takes the view more favorable to HLI -- that the parental grief number (for which much less work has been done) is at least the major cause of the ratio being off.
As I see it, the number of WELLBYs preserved by averting an episode of parental grief is very unlikely to be material to any decision under HLI's cost-effectiveness model. Under philosophical assumptions where it is a major contributor to the cost-effectiveness estimate, that estimate is almost always going to be low enough that life-saving interventions won't be considered cost-effective on the whole. Under philosophical assumptions where life-saving programs may be cost-effective, the bulk of the effectiveness will come directly from the effect on the saved life itself. Thus, it would not be unreasonable for HLI -- which faces significant resource constraints -- to have deprioritized attempts to improve the accuracy of its estimate for WELLBYs preserved by averting an episode of parental grief.
Given that, I can see three ways of dealing with parental grief in the cost-effectiveness model for AMF. Ignoring it seems rather problematic. And I would argue that reporting the value one's relatively shallow research provided (with a disclaimer that one has low certainty in the value) is often more epistemically virtuous than making up adjusting to some value one thinks is more likely to be correct for intuitive reasons, bereft of actual evidence to support that number. I guess the other way is to just not publish anything until one can turn in more precise models . . . but that norm would make it much more difficult to bring new and innovative ideas to the table.
I don't think the thermometer analogy really holds here. Assuming HLI got a significantly wrong value for WELLBYs preserved by averting an episode of parental grief, there are a number of plausible explanations, the bulk of which would not justify not "listen[ing] to [them] anymore." The relevant literature on grief could be poor quality or underdeveloped; HLI could have missed important data or modeled inadequately due to the resources it could afford to spend on the question; it could have made a technical error; its methodology could be ill-suited for studying parental grief; its methodology could be globally unsound; and doubtless other reasons. In other words, I wouldn't pay attention to the specific thermometer that said it was much hotter than it was . . . but in most cases I would only update weakly against using other thermometers by the same manufacturer (charity evaluator), or distrusting thermometer technology in general (the WELLBY analysis).
Moreover, I suspect there have been, and will continue to be, malfunctioning thermometers at most of the major charity evaluators and major grantmakers. The grief figure is a non-critical value relating to an intervention that HLI isn't recommending. For the most part, if an evaluator or grantmaker isn't recommending or funding an organization, it isn't going to release its cost-effectiveness model for that organization at all. Even where funding is recommended, there often isn't the level of reasoning transparency that HLI provides. If we are going to derecognize people who have used malfunctioning thermometer values in any cost-effectiveness analysis, there may not be many people left to perform them.
I've criticized HLI on several occasions before, and I'm likely to find reasons to criticize it again at some point. But I think we want to encourage its willingness to release less-refined models for public scrutiny (as long as the limitations are appropriately acknowledged) and its commitment to reasoning transparency more generally. I am skeptical of any argument that would significantly incentivize organizations to keep their analyses close to the chest.
It's a completely different conversation in my book. The post, per the title, is an assessment of HLI's model of SM's effectiveness. I dont really see Vasco's comment as about GW's assessment of HLI's model, HLI's model itself, or SM's effectiveness with any particularity. It's more about the broad idea that GH&D effects for almost any GH&D program may be swamped by animal-welfare and longtermist effects.
I do actually think there is a related point to be made that is appropriate to the post: (1) it is good that we have a new published analysis that SM is very likely an effective charity; because (2) even under GW's version of the analysis, some donors may feel SM is an attractive choice in the global health & development space because they are concerned about the meat-eater problem [link to Vasco's analysis here] and/or environmental concerns that potentially affect life-saving and economic-development modes of action.
The reasons I'd find that kind of comment helpful -- but didn't find the comment by @Vasco as written well-suited for this post include:
(1) the perspective above is an attempt at a practical application of GW's findings that is much more hooked into the main subject of the post (which is about SM and HLI's CEA thereof), and
(2) By noting the meat-eater problem but linking to a discussion in one's own post, rather than attempting to explain/discuss it in a post trying to nail down the GH&D effects of SM, the risk of derailing the discussion on someone else's post is significantly reduced.
Given that Jeff posted this shortly after raising the possibility that he should write a book (of the sort that could easily make it onto many lending/giving tables) -- I admire the post against potential self-interest here.
I'm not Joel (nor do I work for HLI, GiveWell, SM, or any similar organization). I do have a child, though. And I do have concerns with overemphasis on whether one is a parent, especially when one's views are based (in at least significant part) on review of the relevant academic literature. Otherwise, does one need both to be a parent and to have experienced a severe depressive episode (particularly in a low-resource context where there is likely no safety net) in order to judge the tradeoffs between supporting AMF and supporting SM?
Personally -- I am skeptical that the positive effect of therapy exceeds the negative effect of losing one's young child on a parent's own well-being. I just don't think the thought experiment you proposed is a good way to cross-check the plausibility of such a view. The consideration of the welfare of one's child (independent of one's own welfare) in making decisions is just too deeply rooted for me to think we can effectively excise it in a thought experiment.
In any event -- given that SM can deliver many courses of therapy with the resources AMF needs to save one child, the two figures don't need to be close if one believes the only benefit from AMF is the prevention of parental grief. SM's effect size would only need to be greater 1/X of the WELLBYs lost from parental grief from one child death, where X is the number of courses SM can deliver with the resources AMF needs to prevent one child death. That is the bullet that epicurean donors have to bite to choose SM over AMF.
I don't think that's the right question for three reasons.
The hypothetical mother will almost certainly consider the well-being of her child (under a deprivationist framework) in making that decision -- no one is suggesting that saving a life is less valuable than therapy under such an approach. Whatever the merits of an epicurean view that doesn't weigh lost years of life, we wouldn't have made it long as a species if parents applied that logic to their own young children.
Second, the hypothetical mother would have to live with the guilt of knowing she could have saved her child but chose something for herself.
Finally, GiveWell-type recommendations often would fail the same sort of test. Many beneficiaries would choose receiving $8X (where X = bednet cost) over receiving a bednet, even where GiveWell thinks they would be better off with the latter.
Fair points. I'm not planning to move my giving to GiveWell All Grants to either SM or GD, and don't mean to suggest anyone else does so either. Nor do I want to suggest we should promote all organizations over an arbitrary bar without giving potential donors any idea about how we would rank within the class of organizations that clear that bar despite meaningful differences.
I mainly wrote the comment because I think the temperature in other threads about SM has occasionally gotten a few degrees warmer than I think optimally conducive to what we're trying to do here. So it was an attempt at a small preventive ice cube.
I think you're right that we probably mean different things by "one of." 5-10X differences are big and meaningful, but I don't think that insight is inconsistent with the idea that a point estimate something around "above GiveDirectly" is around the point at which an organization should be on our radar as potentially worth recommending given the right circumstances.
One potential definition for the top class would be whether a person could reasonably conclude on the evidence that it was the most effective based on moral weights or assumptions that seem plausible. Here, it's totally plausible to me that a donor's own moral weights might value reducing suffering from depression relatively more than GiveWell's analysis implies, and saving lives relatively less. GiveWell's model here makes some untestable philosophical assumptions that seem relatively favorable to AMF: "deprivationist framework and assuming a 'neutral point' of 0.5 life satisfaction points." As HLI's analysis suggests at Section 3.4 of this study, the effectiveness of AMF under a WELLBY/subjective well-being model is significantly dependent on these assumptions.
For a donor with significantly different assumptions and/or moral weights, adjusting for those could put SM over AMF even accepting the rest of GiveWell's analysis. More moderate philosophical differences could put one in a place where more optimistic empirical assumptions + a expectation that SM will continue reducing cost-per-participant and/or effectively refine its approach as it scales up could lead to the same conclusion.
Another potential definition for the top class would be whether one would feel more-than-comfortable recommending it to a potential donor for whom there are specific reasons to choose an approach similar to the organization's. I think GiveWell's analysis suggests the answer is yes for reasons similar to the above. If you've got a potential donor who just isn't that enthuiastic about saving lives (perhaps due to emphasizing a more epicurean moral weighting) but is motivated to give to reducing human suffering, SM is a valuable organization to have in one's talking points (and may well be a better pitch than any of the GiveWell top charities under those circumstances).
See footnote 3.