853 karmaJoined

# Posts 7

Sorted by New
12
ishaan
26
ishaan
62
ishaan
55
ishaan
53
100
ishaan
51
ishaan

If you're interested in the question of why I figure that doing voluntary unpaid labor for most people involves negligible harm or opportunity cost of the type that ought to change our effectiveness estimate, my napkin calculation goes something like: suppose I personally were to replace half my compensated work hours with volunteering. That would be several tens of thousands of dollars of value to me and would half my income, which would be a hardship to me. But it should only take ~$285.92 to "offset" that harm to me personally, by doubling someone else's income - theoretically the harm i experience for losing one day's labor can be fully offset by giving a givedirectly recipient 0.78 cents. Obviously my own labor is worth more than 0.78 cents a day on the market, but matters being how they are, 0.78 cents seems to be around what it takes to "offset" the humanitarian cost to me personally (and the same to any volunteer, with respect to them-personally) of losing a day's labor. Having established that, how much unpaid labor are we talking about? 5-12 people who meet for 90 minutes a week, over six sessions. Assuming 8 hours of capacity-to-work per day, a volunteer does 5.6-13.5 days of labor per client treated. 1.5 hours × 6 sessions × (5 to 12 people) / 8 hour workday = 5.6-13.5 days of labor per client treated So if we imagine that volunteers are actually missing work opportunities and sacrificing income in order to do this,$4.37-$10.53 donation to GiveDirectly per client treated might represent the cost to "offset" these potential "harms". But hopefully volunteers aren't actually choosing to forgo vital economic opportunities and taking on hardship in order to take on volunteer labor, and are doing it because they like it and want to and get something out of it, so maybe the true opportunity cost/disvalue to them is <20% of the equivalent labor-hours, if not zero? anyway, this all seems kind of esoteric and theoretical - just something i had considered adding to the analysis (because at first I had an intuition, perhaps one which you had as well to motivate your question, that volunteer labor isn't really free, just because it is unpaid), and then discarded because it seemed like the effect would be too negligible to be worth the complications it added. Like i said in the other comment, all of that is quite different from the other matter of what it might cost to pay a staff member or volunteer - staff and volunteers presumably have more earning potential as givedirectly recipients who are selected to be low income, and might be able command salaries much higher than that. And I don't, actually, know the answer with respect to what the program might cost if all volunteers were instead paid, or exactly to what extent main costs are actually about the labor of people directly playing therapist roles (total staff salaries vs other costs might be easier to grab, but that's answering a different quesiton than you asked, staff infrastructure won't be all about leading therapy sessions). To clarify, when i said it "would have been negligible" what I meant more precisely is that "doing voluntary unpaid labor does negligible harm (and probably actually confers benefits) to the volunteer, so I didn't factor it into the cost estimate". This is very different from the question of "how much would the cost change if the volunteers were to be paid" - which is a question that I do not know the answer to. That's an interesting question and i don't know! I had myself implicitly been assuming it was from volunteer labor being unpaid for. But now that you put the question to me explicitly I realize that it could easily be due other factors (such as fewer expenses in some other category such as infrastructure, or salaries per staff, hypothetically). I don't think the answer to this question would change anything about our calculations, but if we happen to find out I'll let you know. In an earlier draft I was considering counting the volunteer labor as a cost equal to the size of the benefits from the cash that the volunteer might have earned otherwise, but I left it off because it would have been too small of an effect to make a notable difference. ie SoGive would thinks depression is worse than death. Maybe this isn't quite a "sanity check" but I doubt many people have that moral view. I replied in the moral weights post w.r.t. "worse than death" thing. (I think that's a fundamentally fair, but fundamentally different point from what I meant re: sanity checks w.r.t not crossing hard lower bounds w.r.t. the empirical effects of cash on well being vs the empirical effect of mental health interventions on well being) My response to this post overall is that I think some of what is going on here is that different people and different organizations mean very different things when we say "Depression". Since "depression" is not really a binary, the value of averting "1 case of severe depression" can change a lot depending on how you define severity, in such a way that differences in reasonable definitions of "sufficiently bad depression" can plausibly differ by 1-3x when you break it down into "how many SD counts as curing depression" terms. However, the in-progress nature of SoGives' mental health work makes pinning down what we do mean sort of tricky. What exactly did the participants in the SoGive Delphi Process mean when they said "severe depression"? How should I, as an analyst who isn't aiming to set the moral weights but is attempting to advise people using them, interpret that? These things are currently in flux, in the sense that I'm basically in the process of making various judgement calls about them right now, which I'll describe below. You commented: I'm not sure 2-5 SD-years is plausible for severe depression. 3 SDs would saturate the entire scale 0-24. It's true that the PHQ-9 score of 27 points maxes out around 2-4sd. How many SD it is exactly depends on the spread of your population of course (for example if 1sd=6.1 points then the range of a 27 point scale spans 4.42sd ), and for some population spreads it would be 3sd. These two things are related actually! I think the trouble is that the word "severity depression" is ambiguous as to how bad it is, so different people can mean different things by it. One might argue that the following was an awkward workaround which should have been done differently, but basically, to make transparent my internal thought process here (In terms of what I thought after joining sogive, starting this analysis, and encountering these weights) was the following: -> "hm, this implies we're willing to trade averting 25 years of depression against one (mostly neonatal) death. Is this unusual?" -> "Maybe we are thinking about the type of severe, suicidal depression that is an extremely net negative experience, a state which is worse than death." -> "Every questionnaire creator seems to have recommended cut-offs for gradients of depression such as "mild" and "moderate" (e.g. the creators of the PHQ-9 scale are recommending 20 points as the cut-off for "severe" depression) but these aren't consistent between scales and are ultimately arbitrary choices." -> "extrapolating linearly from the time-trade-off literature people seemed to think that a year of depression breaks even with dying a year earlier around 5.5sd. Maybe less if it's not linear." -> "But maybe it should be more because what's really happening here is that we're seeing multiple patients improve by 0.5-0.8 sd. The people surveyed in that paper think that the difference between 2sd->3sd is bigger than 1sd->2sd. People might disagree on the correct way to sum these up." -> concluding with me thinking that various reasonable people might set the standard for "averting severe depression" between 2-6 sd, depending on whether they wanted ordinary severity or worse than death severity So, hopefully that answers your question as to why I wrote to you that 2-5sd is reasonable for severe depression. I'm going to try to justify this further in subsequent posts. Some additional thoughts that I had were: -> I notice that this is still weighting depression more heavily than the people surveyed in the time-trade-off, but if we set it on the higher range of 3-6sd it still feels like a morally plausible view (especially considering that some people might have assigned lower moral weight to neonates). -> My role is to tell people what the effect is, not to tell them what moral weights to use. However, I'm noticing that all the wiggle room to interpret what "severe" means is on me, and I notice that I keep wanting to nudge the SD-years I accept as higher in order to make the view match what I think is morally plausible. -> I'll just provisionally use something between 3-5 sd-years for the purpose of completing analysis, because my main aim is to figure out what therapy does in terms of sd. -> But I should probably publish a tool that allows people to think about moral weights in terms of standard deviation, and maybe we can survey people for moral weights again in the future in a manner that lets them talk about standard deviations rather than whatever connotations they attached to "severe depression". Then we can figure out what people really think about various grades of depression and how much income and life they're willing to trade about it. In fact the next thing I'm scheduled to publish is a write up that talks in detail about how to translate SD into something more morally intuitive. So hopefully that will help us make some progress on the moral weights issue. So to summarize, I think (assuming your calculations w.r.t. everyone else's weights are correct) what's going on here is that it looks like SoGive is weighing depression 4x more than everyone, but those moral weights were set in the absence of a concrete recommendations, and in the end ...and arguably this is an artifact me choosing after the fact to set a really high SD threshold for "severity" as a reaction to the weights, and what really needs to happen is that we need to go through that process I described of polling people again in a way that breaks down "severity" differently... in the final analysis, once a concrete recommendation comes out, it probably won't be that different? (Though you've added two items, sd<->daly/wellby and cash<->sd, on my list of things to check for robustness and if it ends up being notable I'm definitely going to flag it, so thank you for that). I do think that this story will ultimately end with some revisiting of moral weights, how they should be set, and what they mean, and how to communicate them. (There's another point that came up in the other thread though, regarding "does it pass the sanity check w.r.t. cash transfer effects on well being", which this doesn't address. although it falls outside the scope of my current work I have been wanting to get a firmer sense of the empirical cash <-> wellby <-> sd depression correlations and apropos of your comments perhaps this should be made more explicit in moral weights agendas.) To expand a little on "this seems implausible": I feel like there is probably a mistake somewhere in the notion that anyone involves thinks that <doubling income as having 1.3 WELLBY and severe depression has having a 1.3 WELLBY effect.> The mistake might be in your interpretation of HLI's document (it does look like the 1.3 figure is a small part of some more complicated calculation regarding the economic impacts of AMF and their effect on well being, rather than intended as a headline finding about the cash to well being conversion rate). Or it could be that HLI has an error or has inconsistencies between reports. Or it could be that it's not valid to apply that 1.3 number to "income doubling" SoGive weights for some reason because it doesn't actually refer to the WELLBY value of doubling. I'm not sure exactly where the mistake is, so it's quite possible that you're right, or that we are both missing something about how the math behind this works which causes this to work out, but I'm suspicious because it doesn't really fit together with various other pieces of information that I know. For instance - it doesn't really square with how HLI reported Psychotherapy is 9x GiveDirectly when the cost of treating one person with therapy is around$80, or how they estimated that it took \$1000 worth of cash transfers to produce 0.92 SDs-years of subjective-well-being improvement ("totally curing just one case of severe depression for a year" should correspond to something more like 2-5 SD-years).

I wish I could give you a clearer "ah, here is where i think the mistake is" or perhaps a "oh, you're right after all" but I too am finding the linked analysis a little hard to follow and am a bit short on time (ironically, because I'm trying to publish a different piece of Strongminds analysis before a deadline).  Maybe one of the things we can talk about once we schedule a call is how you calculated this and whether it works? Or maybe HLI will comment and clear things up regarding the 1.3 figure you pulled out and what it really means.

Good stuff. I haven't spent that much time looking at HLIs moral weights work but I think the answer is "Something is wrong with how you've constructed weights, HLI is in fact weighing mental health harder than SoGive". I think a complete answer to this question requires me checking up on your calculations carefully, but I haven't done so yet, so it's possible that this is right.

If if were true that HLI found anything on the order of roughly doubling someone's consumption  improved well being as much as averting 1 case of depression, that would be very important as it would mean that SoGive moral weights fail some basic sanity checks. It would imply that we should raise our moral weight on cash-doubling to at least match the cost of therapy even under a purely subjective-well-being oriented framework to weighting. (why not pay 200 to double income, if it's as good as averting depression and you would pay 200 to avert depression?) This seems implausible.

I haven't actually been directly researching the comparative moral weights aspect, personally - I've been focusing primarily on <what's the impact of therapy on depression in terms of effect size> rather than on the "what should the moral weights be" question (though I have put some attention to the "how to translate effect sizes into subjective intuitions" question, but that's not quite the same thing). That said when I have more time I will look more deeply into this and check if our moral weights are failing some sort of sanity check on this order, but, I don't think that they are.

Regarding the more general question of "where would we stand if we altered our moral weights to be something else", ask me again in a month or so when all the spreadsheets are finalized, moral weights should be relatively easy to adjust once the analysis is done.

(as sanjay alludes to in the other thread, I do think all this is a somewhat separate discussion from the GWWC list - my main point with the GWWC list was that StrongMinds is not in the big picture actually super out of place with the others, in terms of how evidence-backed it is relative to the others, especially when you consider the big picture of the background academic literature about the intervention rather than their internal data. But I wanted to address the moral weights issue directly as it does seem like an important and separate point.)

I'm a researcher at SoGive conducting an independent evaluation of StrongMinds which will be published soon. I think the factual contents of your post here are correct. However, I suspect that after completing the research, I would be willing to defend the inclusion of StrongMinds on the GGWC list, and that the SoGive write-up will probably have a more optimistic tone than your post. Most of our credence comes from the wider academic literature on psychotherapy, rather than direct evidence from StrongMinds (which we agree suffers from problems, as  you have outlined).

Regarding HLI's analysis, I think it's a bit confusing to talk about this without going into the details because there are both "estimating the impact"  and "reframing how we think about moral-weights" aspects to the research. Ascertaining what the cost and magnitude of therapy's effects are must be considered separately from the "therapy will score well when you use subjective-well-being as the standard by which therapy and cash transfers and malaria nets are graded" issue. As of now I do roughly think that HLI's numbers regarding what the costs and effect sizes of therapy are on patients are in the right ballpark. We are borrowing the same basic methodology for our own analysis. You mentioned being confused by the methodology -  there are a few points that still confuse me as well, but we'll soon be publishing a spreadsheet model with a step by step explainer on the aspects of the model that we are borrowing, which may help.

If you ( @Simon_M or anyone else wishing to work at a similar level of analysis) is planning on diving into these topics in depth, I'd love to get in touch on the Forum and exchange notes.

Regarding the level of evidence: SoGive's analysis framework outlines a "gold standard" for high impact, with "silver" and "bronze" ratings assigned to charities with lower-but-still-impressive cost-effectiveness ratings. However, we also distinguish between "tentative" ratings and "firm" ratings, to acknowledge that some high impact opportunities are based on more speculative estimates which may be revised as more evidence comes in.  I don’t want to pre-empt our final conclusions on StrongMinds, but I wouldn’t be surprised if “Silver (rather than Gold)” and/or “Tentative (rather than Firm)” ended up featuring in our final rating. Such a conclusion still would be a positive one, on the basis of which donation and grant recommendations could be made.

There is precedent for effective altruists recommending donations to charities for which the evidence is still more tentative. Consider that Givewell recommends "top charities", but also recommends less proven potentially cost-effective and scalable programs (formerly incubation grants). Identifying these opportunities allows the community to explore new interventions, and can unlock donations that counterfactually would not have been made, as different donors may make different subjective judgment calls about some interventions, or may be under constraints as to what they can donate to.

Having established that there are different criteria that one might look at in order to determine when an organization should be included in a list, and that more than one set of standards which may be applied, the question arises: What sort of standards does the GWWC top charities list follow, and is StrongMinds really out of place with the others? Speaking the following now personally, informally and not on behalf of any current or former employer: I would actually say that StrongMinds has much more evidence backing than many of the other charities on this list (such as THL, Faunalytics, GFI, WAI, which by their nature don't easily lend themselves to RCT data). Even if we restrict our scope to the arena of direct (excluding e.g. excluding pandemic research orgs) global health interventions, I wouldn't be surprised if bright and promising potential stars such as Suvita and LEEP are actually at a somewhat similar stage as StrongMinds - they are generally evidence-based enough to deserve their endorsement on this list, but I'm not sure they've been as thoroughly vetted by external evaluators the way more established organizations such as Malaria Consortium might be.  Because of all this, I don't think StrongMinds seems particularly out of place next to the other GWWC recommendations. (Bearing in mind again that I want to speak casually as an individual for this last paragraph, and I am not claiming special knowledge of all the orgs mentioned for the purposes of this statement).

Finally, it's great to see posts like this on the EA forum, thanks for writing it!

Cool project! I suggest that the shrimp heart should be a different color, as most shrimp usually are not pink and only turn pink after cooking (although there are some exceptions to this so maybe this is too nitpicky and it's fine?). I am also not sure whether or not a living shrimp typically would have a curled up pose. Alternatively if you'd rather not do a full image redesign, or if there is a concern that people will not realize it is a shrimp if it looks too different from what they're used to seeing, it might help to instead have go vegan! text or something to clarify that it isn't that the sticker bearer likes eating shrimp.

I thought "EA hotel" was pretty great as a straightforward description, good substitutes might have a word for "ea" and a word for "hotel". So like:

Bentham's Base
Helpers' House

Swap with Lodge, Hollow, Den if alliteration is too cute
e.g. "Bentham's House", "Bentham's Lodge" both sound pretty serious.

Or just forget precedent and brand something new e.g. Runway (or Runway Athena)

Some "just kidding" alliterative options that I couldn't resist:
Crypto crib, Prioritization Place, Utilitarian's Union, Consequentialist Club, Greg's iGloo