All of JoelMcGuire's Comments + Replies

Keep up the good struggle my dear fish loving friends. 

2
haven
2mo
Thanks Joel! And same to you—hope the research is going well

[Not answering on behalf of HLI, but I am an HLI employee]

Hi Michal, 

We are interested in exploring more systematic solutions to aligning institutions with wellbeing. This topic regularly arises during strategic conversations. 

Our aim is to eventually influence policy, for many of the reasons you mention. But we’re currently focusing on research and philanthropy. This is because there’s still a lot we need to learn about how to measure and best improve wellbeing. But before we attempt to influence how large amounts of resources are spent, I think... (read more)

1
Michal Porwisz
3mo
Hi Joel,  Thank you for your response. I definitely agree about the need and usefulness of more high-quality research in the field of well-being, including wellbeing priorities and cost-effectiveness analyses. Has HLI considered taking actions to promote such research, beyond conducting research on your own? On a related note, I recently came across the Global Flourishing Study (GFS), a $43.4 million initiative which can be found here: https://hfh.fas.harvard.edu/global-flourishing-study. While I haven't delved into the details of this study, its existence underscores the growing interest in well-being research. Influencing the direction and quality of such research could be incredibly impactful.

I disagree because I think writing text to indicate a sentiment is a stronger signal than pressing a button. So while it’s somewhat redundant, it adds new information IMO. 
 

As a writer, I pay attention to these signals when processing feedback.

2
Isaac King
3mo
Isn't that what the strong upvote is for?

Hi again Jason, 

 When we said "Excluding outliers is thought sensible practice here; two related meta-analyses, Cuijpers et al., 2020cTong et al., 2023, used a similar approach" -- I can see that what we meant by "similar approach" was unclear. We meant that, conditional on removing outliers, they identify a similar or greater range of effect sizes as outliers as we do.

This was primarily meant to address the question raised by Gregory about whether to include outliers: “The cut data by and large doesn't look visually 'outlying' to me.... (read more)

Hi Jason, 

“Would it have been better to start with a stipulated prior based on evidence of short-course general-purpose[1] psychotherapy's effect size generally, update that prior based on the LMIC data, and then update that on charity-specific data?”

1. To your first point, I think adding another layer of priors is a plausible way to do things – but given the effects of psychotherapy in general appear to be similar to the estimates we come up with[1] – it’s not clear how much this would change our estimates. 

There are probably two ... (read more)

Hi Victor, 

Our updated operationalization of psychotherapy we use in our new report (page 12) is 

"For the purposes of this review, we defined psychotherapy as an intervention with a structured, face-to-face talk format, grounded in an accepted and plausible psychological theory, and delivered by someone with some level of training. We excluded interventions where psychotherapy was one of several components in a programme."

So basically this is "psychotherapy delivered to groups or individuals by anyone with some amount of training". 

Does that... (read more)

1
VictorW
4mo
That's clear to me now and thank you also for the pointer on 1 to 1 effectiveness!

They only include costs to the legal entity of StrongMinds. To my understanding, this includes a relatively generous stipend they provide to the community health workers and teachers that are "volunteering" to provide StrongMinds or grants StrongMinds makes to NGOs to support their delivery of StrongMinds programs. 

Note that 61% of their partnership treatments are through these volunteer+ arrangements with community health workers and teachers. I'm not too worried about this since I'm pretty sure there aren't meaningful additional costs to consider. t... (read more)

Hi Nick, 

Good question. I haven't dug into this in depth, so consider this primarily my understanding of the story. I haven't gone through an itemized breakdown of StrongMinds costs on a year by year basis to investigate this further.

It is a big drop from our previous costs. But I originally did the research in Spring 2021, when 2020 was the last full year. That was a year with unusually high costs. I didn't use those costs because I assumed this was mostly a pandemic related aberration, but I wasn't sure how long they'd keep the more expensive practi... (read more)

2
NickLaing
4mo
Thanks yeah that's a great graphic. Do they include government salaries and other NGO costs as part of their costs too?

Hi Jason, 

The bars for AMF in Figure 2 should represent the range of cost-effectiveness estimates that come from inputting different neutral points, and for TRIA the age of connectedness. 

This differs from the values given in Table 25 on page 80 because, as we note below that table, the values there are based on assuming a neutral point of 2 and an TRIA age of connectedness of 15. 

The bar also differs for the range given in Figure 13 on page 83 because the lowest TRIA value has an age of connectivity of 5 years, where in Figure 2 (here) we a... (read more)

4
Jason
4mo
Thanks -- that helps. "The lines for AMF (the Against Malaria Foundation) are different from the others" was really confusing without the footnote call following it; it makes a lot more sense now after the fix!

Neat work. I wouldn't be surprised if this ends up positively updating my view on the cost-effectiveness of advocacy work. 

What's your take on possibility someone could empirically tackle a related issue we also tend to do a lot of guessing at -- the likelihood of $X million spent advocacy in a certain domain leading to reform. 

4
zdgroff
5mo
I think there are probably ways to tackle that but don't have anything shovel-ready. I'd want to look at the general evidence on campaign spending and what methods have been used there, then see if any of those would apply (with some adaptations) to this case.

The prospect of a nuclear conflict is so terrifying I sometimes think we should be willing to pay almost any price to prevent such a possibility. 

But when I think of withdrawing support for Ukraine or Taiwan to reduce the likelihood of nuclear war, that doesn't seem right either -- as it'd signal that we could be threatened into any concession if nuclear threats were sufficiently credible.

How would you suggest policymakers navigate such terrible tradeoffs?

3
Andy Weber
6mo
That’s part of the job - there are few easy policy decisions. I would give NATO and the Biden Administration high marks for lowering the risk of nuclear war AND supporting Ukraine and Taiwan. When Putin and his minions were making reckless and dangerous nuclear threats, we were calm and did not change our nuclear posture. This approach seems to be working.

How much do you think the risk of nuclear war would increase over the century if Iran acquired nuclear weapons? And what measures, if any, do you think are appropriate to attempt to prevent this or other examples of nuclear proliferation?

Joel, I’m highly confident Iran will not acquire a nuclear weapon. The U.S. and Israel have exquisite intelligence on the Iranian nuclear program, which has been a high priority for decades. Should Supreme Leader Ali Khamenei change his policy and pursue a nuclear weapon, we would know.

During my time in government I was involved in convincing the Israeli government not to launch a military strike against Iranian nuclear facilities. I was also involved in developing the military capabilities needed if Iran did opt for nuclear weapons, and these capabilit... (read more)

3
jablevine
6mo
Yeah, the conflict in Laascaanood is a bit of a damper. But the rebels control less maybe 15% of the country's land area, and ~5% of its population.[1] Further, Somaliland has never really asserted its sovereignty over the city,[2] and it's not particularly important.[3] It wasn't clear in Phillips why Somaliland attempted to include the Sool region in their secession from Somalia, as it voted against the constitution in a referendum. This current flare-up is a continuation of the (longer border conflict with Puntland)[https://en.wikipedia.org/wiki/Puntland–Somaliland_dispute]. I'm generally confused by this conflict. My main thought, different from what I wrote above, is that it's an indicator that the Isaaq majority is more willing to assert stronger political authority, weakening the clan-based power sharing structure. ---------------------------------------- 1. These are really rough guesses. Would be happy to see good sources. ↩︎ 2. And neither did Puntland: "In many respects, Laasaanood seems to be part of the Puntland state of Somalia. [...] In Garoowe it becomes clear that Laascaanood is perceived as the political periphery and people there are not fully trusted by officials in the capital of Puntland." (Hoehne, 104) ↩︎ 3. I think it would be the seventh or eighth largest city in Somaliland, in a largely un-urbanized country. ↩︎

“Thank you for the comment. There’s a lot here. Could you highlight what you think the main takeaway is? I don’t have time to dig into this at present, so any condensing would be appreciated. Thanks again for the time and effort.” ??

I believe that large tech companies are, on average, more efficient at converting talent into market cap value than small companies or startups are. They typically offer higher salaries, for one.

This may be true for market cap, but let's be careful when translating this to do-goodery. E.g., wages don't necessarily relate to productivity. Higher wages could also reflect higher rents, which seems plausibly self-reinforcing by drawing (and shelving) innovative talent from smaller firms. A quote from a recent paper by Akcigit and Goldschlag (2023) is suggestiv... (read more)

2
Ozzie Gooen
9mo
I'm sure that large companies do monopolistic techniques to get unfair advantages. I remember hearing about this literature before. I can't help but be a bit suspicious.  If decentralization were so productive, wouldn't huge companies notice this and take advantage of it? It would be very easy for large firms to reorganize so that there's very little centralization. I'd be skeptical of the specific hypothesis, "large firms intentionally want their employees to be unproductive," if that's being suggested. I'm similarly suspicious of the "6 to 11 percent" figure. It's very easy for me to imagine that their innovations do better on some measures, like, "more likely to influence a lot of people later on, because of integrations." > I wish I could have started as a researcher with the structure and guidance of a larger organization, but I really doubt I'd have pursued as important research if we hadn't tried to challenge other, larger organizations. From the standpoint of funders, I'd expect that they would want the "really promising people" to be running and reforming the big institutions, instead of creating new institutions. My main points are for those in charge of things and at the margin. From the standpoint of an individual facing an organization that clearly doesn't seem good to them - I agree it can easily make sense for them to figure its best making their own external organization. (I'd also flag that many people I know who left companies to do independent work, did projects that really didn't seem that great/scalable to me. I think it's hard to tell.)

I don't think this is right- "Russia" doesn't make actions, Vladimir Putin does; Putin is 70, so he seems unlikely to be in power once Russia has recovered from the current war; there's some evidence that other Russian elites didn't actively want the war, so I don't think it's right to generalize to "Russia".

Even if it was true that many elites were anti-war before the invasion, I think the war has probably accelerated a preexisting process of ideological purification. So even when Putin kicks the can, I think the elites will be just as likely to say "We d... (read more)

1
alexrichard
9mo
It's unlikely that both the US and China can get mass numbers of supplies to Taiwan. If you can get a (slow/big/vulnerable) freighter to Taiwan you can also almost certainly get an armed military ship, a submarine, or a stealth fighter to Taiwan. I'm not sure what you mean by "local superiority". Virtually every modern anti-ship missile has enough range to completely cover Taiwan. Taiwan is only 150 miles wide, so the LRASM/JSM/YJ-12 etc all have enough range to go from one side to the other, and most of these have enough range to completely cover the island. It's questionable (but plausible) whether a carrier a thousand miles out can survive, let alone a (slow and vulnerable) freighter sailing right up to Taiwan. The absence of complete dominance of the skies means that neither side can safely move around, not that both sides can safely move around. e.g. in Ukraine neither side has complete dominance of the skies, but that certainly doesn't mean that it's safe for either side to be flying cargo planes to the front lines.

Hah! Yeah, stepping back, I think these events are a distraction for most people. Especially if they worsen one's mental health. For me, reflecting on the war makes me feel so grateful and lucky to live where I do. 

Another reason to pay attention is when it seems like it could shortly and sharply affect the chances of catastrophe. At the beginning of the war, I kept asking myself, "At what probability of nuclear war should I: make a plan, consider switching jobs, move to Argentina, etc." But I think we've moved out of the scary zone for a while. 

Fair jabs, but the PRC-Taiwan comparison was because it was the clearest natural experiment that came to mind where different bits of a nation (shared language, culture, etc.) were somewhat randomly assigned to authoritarianism or pluralistic democracy. I'm sure you could make more comparisons with further statistical jiggery-pokery. 

The PRC-Taiwan comparison is also because, imagining we want to think of things in terms of life satisfaction, it's not clear there'd be a huge (war-justifying) loss in wellbeing if annexation by the PRC only meant a rela... (read more)

1
Jamie Elsey
10mo
Thanks for the response and the links to these graphs. This is just a quick look and so could be wrong but looking into some files from the World Values Survey, I find this information which, if correct, would make me think I would not weight this information into my consideration of whether we should be concerned about a country being annexed even to a level of 1% weight. The population of China is ~1.4 billion. The population of Taiwan is ~24 million. The sample size for the Chinese data seems to be 2300 people. And for Taiwan about 1200. I tried to upload a screenshot which I can't work out how to do, but the numbers are in the doc "WV6 Results By Country v20180912" on this page https://www.worldvaluessurvey.org/WVSDocumentationWV6.jsp  I do not think we can have any faith at all that a sample of 2300 people can even come close to representing all the variation in relevant factors related to happiness or satisfaction across the population of China. The ratio of population to respondents is over 600,000, larger than some estimates for the population of Oslo, Glasgow, Rotterdam etc. (https://worldpopulationreview.com/continents/europe/cities) I may be missing something or making some basic error there but if it is roughly correct, then I would indeed call it silly to factor in this survey result when deciding what our response should be to the annexation of Taiwan. I do not think that such a question is in principle about life satisfaction/happiness, but even if it were I would not use this information.

I agree that the agency of newer NATO members (or Ukraine) has been neglected. Still, I don't think this was a primary driver of underestimating Ukraine's chances -- unless I'm missing what "agency" means here. 

I assume predictions were dim about Ukraine's chances at the beginning of the war primarily because Russia and the West had done an excellent job of convincing us that Russia's military was highly capable. E.g., I was disconcerted by the awe/dread with which my family members in the US Army spoke about Russian technical capabilities across mult... (read more)

1
Artūrs Kaņepājs
10mo
Yes Russia had convinced others and FSB had convinced Putin that it's military was much better than it actually was; a key reason why the advances stalled and probably also why Putin launched the war.  But specifically about underestimating Ukraine's chances, I think the "agency"  did impact outcomes a lot. The willingness and ability by society to decide and agree on what's best for the country and act accordingly is roughly what I mean by "agency" in this context.  Had Zelensky accepted offers to flee and had UA society and military accepted the outside views in the first days of the war, then the RU military could have advanced relatively easily. Even in the poor condition that it was in.  But resistance had a huge backing from Ukrainians, that is why Zelensky's popularity soared from 27% to 80-90% when he declined offers to flee. Seems likely to me that Putin did not expect that, expected a large part of population to welcome his soldiers are liberators from the unpopular government. https://iwpr.net/global-voices/zelenskys-approval-ratings-soar-amid-war   https://ratinggroup.ua/research/ukraine/obschenacionalnyy_opros_ukraina_v_usloviyah_voyny_26-27_fevralya_2022_goda.html   

What’s the track record of secular eschatology? 

A recent SSC blog post depicts a dialogue about Eugenics. This raised the question: how has the track record been for a community of reasonable people to identify the risks of previous catastrophes? 

As noted in the post, at different times: 

  • Many people were concerned about overpopulation posing an existential threat (c.f. population bomb, discussed at length in Wizard and The Prophet). It now seems widely accepted that the risk overpopulation posed was overblown. But this depends on how conting
... (read more)
6
Kirsten
10mo
Right to worry about nuclear war, based on information later revealed about the Cuban Missile Crisis and other near misses
2
Kirsten
10mo
Unclear if Y2K was fixed or was never really a problem - this article suggests the latter. https://education.nationalgeographic.org/resource/Y2K-bug/#:~:text=Software and hardware companies raced,worked to address the problem.

While there are some Metaculus questions that ask for predictions of the actual risk, the ones I selected are all conditional of the form, "If a global catastrophe occurs, will it be due X". So they should be more comparable to the RP question "Which of the following do you think is most likely to cause human extinction?"

1
Jamie Elsey
1y
Given the differences in the questions it doesn't seem correct to compare the raw probabilities provided across these - also our question was specifically about extinction rather than just a catastrophe. That being said there may be some truth to this implying some difference between the population estimates and what the metaculus estimates imply if we rank them - AI risk comes out top on the metaculus ratings and bottom in the public, and climate change also shows a sizable rank difference. One wrinkle in taking the rankings like this would be that people were only allowed to pick one item in our questions, and so it is also possible that the rankings could be different if people actually rated each one and then we ranked their ratings. This would be the case if e.g., all the other risks are more likely than AI to be the absolute top risk across people, but many people have AI risk as their second risk, which would suggest a very high ordinal ranking that we can't see from looking at the distribution of top picks.

I know this wasn't the goal, but this was the first time I'd seen general polls of how people rank existential risks, and I'm struck by how much the public differs from Rationalists / EAs (using Metaculus and Toby as a proxy). [1] 

RiskPublic (RP)MetaculusDifference
Nukes42%31%-11%
Climate27%7%-20%
Asteroid9%[2]~0% (Toby Ord)-9%
Pandemic8%Natural: 14%, Eng: 28%6- 20%
AI4%46%42%
  1. ^

    It would also be neat to have super-forecaster judgements.

  2. ^

    I can't help but wonder if people took the wrong lessons from Don't Look Up. 

1
Jamie Elsey
1y
Definitely quite a difference (just to check, are the metaculus numbers the likelihood of that risk being picked as the most likely one, not their likelihood ratings?). I was struck, though not surprised, by the very strong political differences for the risks. It suggests to me that some people might also be giving some kind of signalling of 'what we should be most worried about right now' or perhaps even picking what a 'good person' on their side is supposed to pick, as opposed to really carefully sitting and thinking specifically about the most likely thing to cause extinction. That is sort of the opposite way of how I imagine a forecaster would approach such a question.

In general, i agree politeness doesn’t require that — but id encourage following up in case something got lost in junk if the critique could be quite damaging to its subject.

In case it’s not obvious, the importance of previewing a critique also depends on the nature of the critique and the relative position of the critic and the critiqued. I think those possibly “Punching down” should be more generous and careful than those “punching up”. 

The same goes for the implications of the critique “if true”, whether it’s picking nits or questioning whether the organisation is causing net harm. 

That said, I think these considerations only make a difference between waiting one or two weeks for a response and sending one versus several emails to a couple of people if there’s no response the first time. 

2
Jeff Kaufman
1y
I'm not sure I understand this part? If you're sending a draft as a heads up and don't get a response, I don't think politeness requires sending several emails or waiting more than a week?

Hi Alex, I’m heartened to see GiveWell engage with and update based on our previous work! 

[Edited to expand on takeaway]

My overall impression is:

  • This update clearly improves GiveWell’s deworming analysis.
  • Each % point change in deworming cost-effectiveness could affect where hundreds of thousands of dollars are allocated. So getting it right seems important.
  • More work building an empirical prior seems likely to change the estimated decay of income effects and thus deworming's cost-effectiveness, although it’s unclear what direction.
  • Further progress appe
... (read more)
7
GiveWell
1y
Hi, Joel, Alex here, responding to your comment. Thank you for taking the time to give us this feedback! In response to some of your specific points:  * You're right that we should have characterized the results from Lång and Nystedt (2018) as mixed rather than positive. Thanks for pointing out that mistake. We will update the spreadsheet so that study is correctly color-coded, and update the relevant part of the post. With this adjustment, among the studies we looked at, 3 suggest decreasing effects over time, 2 suggest increasing effects over time, and 5 show mixed effects. This still doesn't seem like it adds up to strong evidence for either increasing or decreasing effects, so my prior of a flat effect over time remains the same.  * We excluded Duflo et al. 2021 because it didn't appear to include much about life cycle impacts on income from the intervention. It does report some increases in income for women in the treatment group between 2019 and 2020. However, I'd be reluctant to interpret that as evidence for increases over adulthood, because it represents only one year and because it compares pre-COVID results with results during COVID, which means other factors are probably at play. * That said, I agree that a more in-depth analysis might lead to a different prior for how we should expect early-life health interventions to affect income over the life cycle. We didn't prioritize an in-depth analysis for this adjustment, but we would be open to more work to create a better-informed prior of deworming's income effects over time. This would require deeper engagement with the studies we looked at to better understand their methodologies, relevance to deworming, and other factors. At the moment, it's not a high-priority project for GiveWell staff, but we're considering an external partnership to explore this further. We imagine that having a better grasp on how income effects change over time could inform our analysis not just of deworming but also of other

Hi John, it’s truly a delight to see someone visually illustrate our work better than we do. Great work!

Great piece. Short and sweet. 

Given the stratospheric karma this post has reached, and the ensuing likelihood it becomes a referenced classic, I thought it'd be a good time to descend to some pedantry. 

"Scope sensitivity" as a phrase doesn't click with me. For some reason, it bounces off my brain. Please let me know if I seem alone in this regard. What scope are we sensitive to? The scope of impact? Also some of the related slogans "shut up and multiply" and "cause neutral" aren't much clearer. "Shut up and multiply" which seems slightly offputti... (read more)

6
SiebeRozendal
10mo
"Scale Matters" ?

Jason, 

You raise a fair point. One we've been discussing internally. Given the recent and expected adjustments to StrongMinds, it seems reasonable to update and clarify our position on AMF to say something like, "Under more views, AMF is better than or on par with StrongMinds. Note that currently, under our model, when AMF is better than StrongMinds, it isn't wildly better.” Of course, while predicting how future research will pan out is tricky, we'd aim to be more specific. 

A high neutral point implies that many people in developing countries believe their lives are not worth living.

This isn't necessarily the case. I assume that if people described their lives as having negative wellbeing, this wouldn't imply they thought their life was not worth continuing. 

  • People can have negative wellbeing and still want to live for the sake of others or causes greater than themselves. 
  • Life satisfaction appears to be increasing over time in low income countries. I think this progress is such that many people who may have negative
... (read more)
2
Jason
1y
Thanks for these points! The idea that people care about more than their wellbeing may be critical here. I'm thinking of a simplified model with the following assumptions: a mean lifetime wellbeing of 5, SD 2, normal distribution, wellbeing is constant through the lifespan, with a neutral point of 4 (which is shared by everyone).  Under these assumptions, AMF gets no "credit" (except for grief avoidance) for saving the life of a hypothetical person with wellbeing of 4. I'm really hesitant to say that saving that person's life doesn't morally "count" as a good because they are at the neutral point. On the one hand, the model tells me that saving this person's life doesn't improve total wellbeing. On the other hand, suppose I (figuratively) asked the person whose life was saved, and he said that he preferred his existence to non-existence and appreciated AMF saving his life.  At that point, I think the WELLBY-based model might not be incorporating some important data -- the person telling us that he prefers his existence to non-existence would strongly suggest that saving his life had moral value that should indeed "count" as a moral good in the AMF column. His answers may not be fully consistent, but it's not obvious to me why I should fully credit his self-reported wellbeing but give zero credence to his view on the desirability of his continued existence. I guess he could be wrong to prefer his continued existence, but he is uniquely qualified to answer that question and so I think I should be really hesitant to completely discount what he says. And a full 30% of the population would have wellbeing of 4 or less under the assumptions. Even more concerning, AMF gets significantly "penalized" for saving the life of a hypothetical person with wellbeing of 3 who also prefers existence to non-existence. And almost 16% of the population would score at least that low. Of course, the real world is messier than a quick model. But if you have a population where the neutra

2. I don't think 38% is a defensible estimate for spillovers, which puts me closer to GiveWell's estimate of StrongMinds than HLI's estimate of StrongMinds.

I wrote this critique of your estimate that household spillovers was 52%. That critique had three parts. The third part was an error, which you corrected and brought the answer down to 38%. But I think the first two are actually more important: you're deriving a general household spillover effect from studies specifically designed to help household members, which would lead to an overestimate.

... (read more)

My intuition, which is shared by many, is that the badness of a child's death is not merely due to the grief of those around them. Thus the question should not be comparing just the counterfactual grief of losing a very young child VS an [older adult], but also "lost wellbeing" from living a net-positive-wellbeing life in expectation.

I didn't mean to imply that the badness of a child's death is just due to grief. As I said in my main comment, I place substantial credence (2/3rds) in the view that death's badness is the wellbeing lost. Again, this my view n... (read more)

5
bruce
1y
That makes sense, thanks for clarifying! If I understand correctly, the updated figures should then be: For 1 person being treated by StrongMinds (excluding all household spillover effects) to be worth the WELLBYs gained for a year of life[1] with HLI's methodology, the neutral point needs to be at least 4.95-3.77 = 1.18. If we include spillover effects of StrongMinds (and use the updated / lower figures), then the benefit of 1 person going through StrongMinds is 10.7 WELLBYs.[2] Under HLI's estimates, this is equivalent to more than two years of wellbeing benefits from the average life, even if we set the neutral point at zero. Using your personal neutral point of 2 would suggest the intervention for 1 person including spillovers is equivalent to >3.5 years of wellbeing benefits. Is this correct or am I missing something here? 1.18 as the neutral point seems pretty reasonable, though the idea that 12 hours of therapy for an individual is worth the wellbeing benefits of 1 year of an average life when only considering impacts to them, and anywhere between 2~3.5 years of life when including spillovers does seem rather unintuitive to me, despite my view that we should probably do more work on subjective wellbeing measures on the margin. I'm not sure if this means: 1. WELLBYs as a measure can't capturing what I care about in a year of healthy life, so we should not use solely WELLBYs when measuring wellbeing; 2. HLI isn't applying WELLBYs in a way that captures the benefits of a healthy life; 3. The existing way of estimating 1 year of life via WELLBYs is wrong in some other way (e.g. the 4.95 assumption is wrong, the 0-10 scale is wrong, the ~1.18 neutral point is wrong); 4. HLI have overestimated the benefits of StrongMinds; 5. I have a very poorly calibrated view of how good / bad 12 hours of therapy / a year of life is worth, though this seems less likely.   Would be interested in your thoughts on this / let me know if I've misinterpreted anything! 1.

To be clear on what the numbers are: we estimate that group psychotherapy has an effect of 10.5 WELLBYs on the recipient's household, and that the death of a child in a LIC has a -7.3 WELLBY effect on the bereaved household. But the estimate for grief was very shallow. The report this estimate came from was not focused on making a cost-effectiveness estimate of saving a life (with AMF). Again, I know this sounds weasel-y, but we haven't yet formed a view on the goodness of saving a life so I can't say how much group therapy HLI thinks is preferable avertin... (read more)

5
bruce
1y
Thanks Joel. My intuition, which is shared by many, is that the badness of a child's death is not merely due to the grief of those around them. So presumably the question should not be comparing just the counterfactual grief of losing a very young child VS an [older adult], but also "lost wellbeing" from living a net-positive-wellbeing life in expectation? I also just saw that Alex claims HLI "estimates that StrongMinds causes a gain of 13 WELLBYs". Is this for 1 person going through StrongMinds (i.e. ~12 hours of group therapy), or something else? Where does the 13 WELLBYs come from? I ask because if we are using HLI's estimates of WELLBYs per death averted, and use your preferred estimate for the neutral point, then 13 / (4.95-2) is >4 years of life. Even if we put the neutral point at zero, this suggests 13 WELLBYs is worth >2.5 years of life.[1] I think I'm misunderstanding something here, because GiveWell claims "HLI’s estimates imply that receiving IPT-G is roughly 40% as valuable as an additional year of life per year of benefit or 80% of the value of an additional year of life total." Can you help me disambiguate this? Apologies for the confusion. 1. ^ 13 / 4.95
0
LGS
1y
I appreciate your candid response. To clarify further: suppose you give a mother a choice between "your child dies now (age 5), but you get group therapy" and "your child dies in 60 years (age 65), but no group therapy". Which do you think she will choose? Also, if you don't mind answering: do you have children? (I have a hypothesis that EA values are distorted by the lack of parents in the community; I don't know how to test this hypothesis. I hope my question does not come off as rude.)

I'd point to the literature on time lagged correlations between household members emotional states that I quickly summarised in the last installment of the household spillover discussion. I think it implies a household spillover of 20%. But I don't know if this type of data should over- or -underestimate the spillover ratio relative to what we'd find in RCTs. I know I'm being really slippery about this, but the Barker et al. analysis stuff so far makes me think it's larger than that. 

I find nothing objectionable in that characterization. And if we only had these three studies to guide us then I'd concede that a discount of some size seems warranted. But we also have A. our priors. And B. some new evidence from Barker et al. Both of point me away from very small spillovers, but again I'm still very unsure. I think I'll have clearer views once I'm done analyzing the Barker et al. results and have had someone, ideally Nathanial Barker, check my work. 

[Edit: Michael edited to add: "It's not clear any specific number away from 0 could ... (read more)

2
MichaelStJules
1y
To clarify what I edited in, I mean that, without better evidence/argument, the effect could be arbitrarily small but still nonzero. What reason do we have to believe it's at least 1%, say, other than very subjective priors? I agree that analysis of new evidence should help.

Joel’s response

[Michael's response below provides a shorter, less-technical explanation.]  

Summary 

Alex’s post has two parts. First, what is the estimated impact of StrongMinds in terms of WELLBYs? Second, how cost-effective is StrongMinds compared to the Against Malaria Foundation (AMF)? I briefly present my conclusions to both in turn. More detail about each point is presented in Sections 1 and 2 of this comment.

The cost-effectiveness of StrongMinds

GiveWell estimates that StrongMinds generates 1.8 WELLBYs per treatment (17 WELLBYs per $100... (read more)

3
ClimateDoc
1y
Regarding the question of what philosophical view should be used, I wonder if it would also matter if someone were something like prioritarian rather than a total utilitarian. StrongMinds looks to focus on people who suffer more than typical members of these countries' populations, whilst the lives saved by AMF would presumably cover more of the whole distribution of wellbeing. So a prioritarian may favour StrongMinds more, assuming the people helped are not substantially better off economically or in other ways. (Though, it could perhaps also be argued that the people who would die without AMF's intervention are extremely badly off pre-intervention.) 

Thank you for this detailed and transparent response!

I applaud HLI for creating a chart (and now an R Shiny App) to show how philosophical views can affect the tradeoff between predominately life-saving and predominately life-enhancing interventions. However, one challenge with that approach is that almost any changes to your CEA model will be outcome-changing for donors in some areas of that chart. [1]

For example, the 53-> 38% correction alone switched the recommendation for donors with a deprivationist framework who think the neutral point is ove... (read more)

LGS
1y21
5
0

Zooming out a little: is it your view that group therapy increases happiness by more than the death of your child decreases it? (GiveWell is saying that this is what your analysis implies.)

(EDITED)

Is this (other than 53% being corrected to 38%) from the post accurate?

Spillovers: HLI estimates that non-recipients of the program in the recipient’s household see 53% of the benefits of psychotherapy from StrongMinds and that each recipient lives in a household with 5.85 individuals.[11] This is based on three studies (Kemp et al. 2009, Mutamba et al. 2018a, and Swartz et al. 2008) of therapy programs where recipients were selected based on negative shocks to children (e.g., automobile accident, children with nodding syndrome, children with psych

... (read more)

Joel from HLI here, 

Alex kindly shared a draft of this report and discussed feedback from Michael and I more than a year ago. He also recently shared this version before publication. We’re very pleased to finally see that this is published! 

We will be responding in more (maybe too much) detail tomorrow. I'm excited to see more critical discussion of this topic. 

Edit: the response (Joel's, Michael's, Sam's) has arrived. 

I'd assume that 1. you don't need the whole household, depending on the original sample size, it seems plausible to randomly select a subset of household members [1](e.g., in house A you interview recipient and son, in B. recipient and partner, etc...) and 2. they wouldn't need to consent to participate, just to be surveyed, no? 

If these assumptions didn't hold, I'd be more worried that this would introduce nettlesome selection issues. 

  1. ^

    I recognise this isn't necessarily simple as I make it out to be. I expect you'd need to be more careful w

... (read more)

Given that this post has been curated, I wanted to follow up with a few points I’d like to emphasise that I forgot to include in the original comment.

  • To my knowledge, we were the first to attempt to estimate household spillovers empirically. In hindsight, it shouldn't be too surprising that it’s been a messy enterprise. I think I've updated towards "messiness will continue". 
  • One hope of ours in the original report was to draw more attention to the yawning chasm of good data on this topic. 
    • "The lack of data on household effects seems like a gap in
... (read more)
2
Guy Raveh
1y
Is it as easy (or easy enough) to enroll participants in RCTs if you need their whole household, rather than just them, to consent to participate? Does it create any bias in the results?

Neat! Cover jacket could use a graphic designer in my opinion. It's also slotted under engineering? Am I missing something?

4
Holden Karnofsky
1y
I threw that book together for people who want to read it on Kindle, but it’s quite half-baked. If I had the time, I’d want to rework the series (and a more recent followup series at https://www.cold-takes.com/tag/implicationsofmostimportantcentury/) into a proper book, but I’m not sure when or whether I’ll do this.

Dear Srdjan, 

I think we do address the potential for negative impacts. As we say in section 2.2 (and elaborate on in Appendix B.3:

 "From 11 studies we estimate that a 1% increase in immigrants as a share of the population is associated with a (non-significant) decrease of -0.004 SDs of SWB (or -0.008 WELLBYs) for the native population."

Additionally, we have a subsection (3.2) called "risk of backlash effects". Again, these may not be the concerns you have in mind, but to say we're only mentioning positive effects is wrong. We mention throughout t... (read more)

James courteously shared a draft of this piece with me before posting, I really appreciate that and his substantive, constructive feedback.

1. I blundered

The first thing worth acknowledging is that he pointed out a mistake that substantially changes our results. And for that, I’m grateful. It goes to show the value of having skeptical external reviewers.

He pointed out that Kemp et al., (2009) finds a negative effect, while we recorded its effect as positive — meaning we coded the study as having the wrong sign.

What happened is that MH outcomes are often "hi... (read more)

Given that this post has been curated, I wanted to follow up with a few points I’d like to emphasise that I forgot to include in the original comment.

  • To my knowledge, we were the first to attempt to estimate household spillovers empirically. In hindsight, it shouldn't be too surprising that it’s been a messy enterprise. I think I've updated towards "messiness will continue". 
  • One hope of ours in the original report was to draw more attention to the yawning chasm of good data on this topic. 
    • "The lack of data on household effects seems like a gap in
... (read more)
Jason
1y84
68
0

Strong upvote for both James and Joel for modeling a productive way to do this kind of post -- show the organization a draft of the post first, and give them time to offer comments on the draft + prepare a comment for your post that can go up shortly after the post does.

The guess is based on a recent (unpublished and not sure I can cite) survey that I think did the best job yet at eliciting people's views on the neutral point in three countries (two LMICs).

I agree it's a big ask to get people to use the exact same scales. But I find it reassuring that populations who we wouldn't be surprised as having the best and worst lives tend to rate themselves as having about the best and worst lives that a 0 to 10 scale allows (Afghanis at ~2/10 and Finns at ~8/10.

That's not to dismiss the concern. I think it's plausible that there... (read more)

Trying to hold onto the word “eugenics” seems to indicate an unrealistically optimistic belief in people’s capacity to tolerate semantics. Letting go is a matter of will, not reason. 

E.g., I pity the leftist who thinks they can, in every conversation with a non-comrade, explain the difference between the theory of a classless society, the history of ostensibly communist regimes committing omnicide, and the hitherto unrealised practice of “real communism” (outside of a few scores of 20th-century Israeli villages and towns). To avoid the reverse problem... (read more)

2
Stuart Armstrong
1y
Ah, you made the same point I did, but better :-)

A note on the "positive utility" bit. I am very uncertain about this. We don't really know where on subjective wellbeing scales people construe wellbeing to go from positive to negative. My best bet is around 2.5 on a 0 to 10 scale. This would indicate that ~18% of people in SSA or South Asia have lives with negative wellbeing if what we care about is life satisfaction (debatable). For the world, this means 11%, which is similar to McAskill's guess of 10% in WWOTF. 

And insofar as happiness is separate from life satisfaction. It's very rare for a count... (read more)

4
Amber Dawn
1y
This is interesting! What is your guess of 2.5/10 based on? I guess this fuzziness makes me feel innately sceptical about such scales - I think one can get well-calibrated at tracking mood or wellbeing with numbers, but I think if you just ask a person who hasn't done this, I wouldn't expect Person A's 5  and Person B's 5 to be the same. 

I haven't downvoted or read the post, but one explanation is the title "You're probably a eugenicist" seems clickbaity and aimed at persuasion. It reads as ripe for plucking out of context by our critics. I immediately see it cited in the next major critique published in a major news org: "In upvoted posts on the EA forum, EAs argue they can have 'reasonable' conversations about eugenics.

One idea for dealing with controversial ideas is to A. use a different word and or B. make it more boring.  If the title read something like, "Most people favor selecting for valuable hereditary traits." My pulse would quicken less upon reading.

9
Ariel Simnegar
1y
Couldn’t agree more. I think clickbaity pulse-quickening titles are more forgivable for blog posts on a personal page (which this post originally was) than on the EA forum. I’d recommend that Sentientist modify the title they use for the crosspost to this forum.

I dont see this as much of an update. Mutual inspections under the treaty haven‘t taken place for a year, it’s basically already been suspended since the invasion. I would be more concerned if he formally withdrew, but he didn’t even do that. 

In retrospect, I think my reply didn't do enough to acknowledge that A. using a different starting value seems reasonable and B. this would lead to a much smaller change in cost-effectiveness foor deworming. While very belated, I'm updating the post to note this for posterity. 

I agree that advocacy for high skilled immigration is more likely to succeed, and that the benefits would probably come more from technological and material progress. The problem is we currently aren't prepared to try and estimate the benefits of these society and world wide spillover effects. 

Maybe we will return to this if (big if) we explore policies that may cost-effectively increase GDP growth (which some argue is = tech progress in the long run?), and through that subjective wellbeing [1] . 

Regarding Malengo, I asked Johannes a fe... (read more)

3
Richard Nerland
1y
First, at Malengo the students fully fund the next cohort via repaying the original donation in an ISA. This means that funding 1 student will actually fund many students over time. Using the numbers above you get a rate of return around 6% annualized. So funding a student is sorta infinite students 0% discount rates. But that is unreasonable, so let's just cap at the next 100 years and say 2% discount rate from inflation. BOTEC for 1 funding pays 12.5 students or a student every 8 years. That changes your calculation from 3x givedirectly to 37.5x. Second, you also said the students are richer but that is factually incorrect, the program is means testing to ensure that students are well targeted. Finally, there are other fudge factors, but they are all dwarfed by the development benefits of immigration. https://www.nber.org/papers/w29862 This shows that nearly 80% of long-run income gains are accrued within sending countries across a wide variety of channels. Hence, I think 37.5x GiveDirectly is a completely reasonable estimate.

Hi David, I'm excited about this! It certainly seems like a step in the right direction. A few vague questions that I'm hoping you'll divine my meaning from:

  • Maybe this is redundant with Gideon's question, but I'd like to press further on this. What is the "depth" of research you think is best suited for the Unjournal? It seems like the vibe is "at least econ working paper level of rigor". But it seems like a great amount of EA work is shallower, or more weirdly formatted than a working paper. I.e., Happier Lives Insitute reports are probably a bit below th
... (read more)
2
david_reinstein
1y
I'm hoping these will be very useful, if we scaled up enough. I also want to work to make these scores more concretely grounded and tied to specific benchmarks and comparison groups. And hope to do better to operationalize specific predictions,[1], and use well-justified tools for aggregating individual evaluation ratings into reliable metrics. (E.g., potentially partnering with initiatives like RepliCATS. Not sure precisely what you mean by 'control for the depth'. The ratings we currently elicit are multidimensional, and the depth should be captured in some of these. But these were basically a considered first-pass; I suspect we can do better to come up with a well-grounded and meaningful set of categories to rate. (And I hope to discuss this further and incorporate the ideas of particular meta-science researchers and initiatives) ---------------------------------------- 1. For things like citations, measures of impact, replicability, votes X years on 'how impactful was this paper' etc., perhaps leveraging prediction markets. ↩︎
8
david_reinstein
1y
In the first stage, that is the idea.  In the second stage, I propose to expand into other tracks. Background: The Unjournal is trying to do a few things, and I think there are synergies:  (see the Theory of Change sketch here) 1. Make academic research evaluation better, more efficient, and more informative (but focusing on the 'impactful' part of academic research) 2. Bring more academic attention and rigor to impactful research 3. Have academics focus on more impactful topics, and report their work in a way that makes it more impactful For this to have the biggest impact, changing the systems and leveraging this...  we need academics and academic reward systems to buy into it. It needs to be seen as rigorous, serious, ambitious, and practical. We need powerful academics and institutions to back it. But even for more modest goal of getting academic experts to (publicly) evaluate niche EA-relevant work, it's still important to be seen as serious, rigorous, credible, etc. That's why we're aiming for the 'rigor stuff' for now, and will probably want to continue this, at least as a flagship  tier/stream into the future But it seems like a great amount of EA work is shallower, or more weirdly formatted than a working paper. I.e., Happier Lives Insitute reports are probably a bit below that level of depth (and we spend a lot more time than many others) and GiveWell's CEAs have no dedicated write-ups. Would either of these research projects be suitable for the Unjournal? Would need to look into specific cases. My guess is that your work largely would be, at least 1. If and when we launch the second stream and 2. For the more in-depth stuff that you might think "I could submit this to a conventional journal but it's too much hassle". GiveWell's CEAs have no dedicated write-ups. I think they should have more dedicated writeups (or other presentation formats), and perhaps more transparent formats, with clear reasoning transparent justifications for their choices, a
Load more