2192 karmaJoined


Doctor from NZ, independent researcher (grand futures / macrostrategy) collaborating with FHI / Anders Sandberg. Previously: Global Health & Development research @ Rethink Priorities.

Feel free to reach out if you think there's anything I can do to help you or your work, or if you have any Qs about Rethink Priorities! If you're a medical student / junior doctor reconsidering your clinical future, or if you're quite new to EA / feel uncertain about how you fit in the EA space, have an especially low bar for reaching out.

Outside of EA, I do a bit of end of life care research and climate change advocacy, and outside of work I enjoy some casual basketball, board games and good indie films. (Very) washed up classical violinist and Oly-lifter.

All comments in personal capacity unless otherwise stated.



Thanks for writing this post!

I feel a little bad linking to a comment I wrote, but the thread is relevant to this post, so I'm sharing in case it's useful for other readers, though there's definitely a decent amount of overlap here.


I personally default to being highly skeptical of any mental health intervention that claims to have ~95% success rate + a PHQ-9 reduction of 12 points over 12 weeks, as this is is a clear outlier in treatments for depression. The effectiveness figures from StrongMinds are also based on studies that are non-randomised and poorly controlled. There are other questionable methodology issues, e.g. surrounding adjusting for social desirability bias. The topline figure of $170 per head for cost-effectiveness is also possibly an underestimate, because while ~48% of clients were treated through SM partners in 2021, and Q2 results (pg 2) suggest StrongMinds is on track for ~79% of clients treated through partners in 2022, the expenses and operating costs of partners responsible for these clients were not included in the methodology.

(This mainly came from a cursory review of StrongMinds documents, and not from examining HLI analyses, though I do think "we’re now in a position to confidently recommend StrongMinds as the most effective way we know of to help other people with your money" seems a little overconfident. This is also not a comment on the appropriateness of recommendations by GWWC / FP)


(commenting in personal capacity etc)


Links to existing discussion on SM. Much of this ends up touching on discussions around HLI's methodology / analyses as opposed to the strength of evidence in support of StrongMinds, but including as this is ultimately relevant for the topline conclusion about StrongMinds (inclusion =/= endorsement etc):


While I agree that both sides are valuable, I agree with the anon here - I don't think these tradeoffs are particularly relevant to a community health team investigating interpersonal harm cases with the goal of "reduc[ing] risk of harm to members of the community while being fair to people who are accused of wrongdoing".

One downside of having the bad-ness of say, sexual violence[1]be mitigated by their perceived impact,(how is the community health team actually measuring this? how good someone's forum posts are? or whether they work at an EA org? or whether they are "EA leadership"?) when considering what the appropriate action should be (if this is happening) is that it plausibly leads to different standards for bad behaviour. By the community health team's own standards, taking someone's potential impact into account as a mitigating factor seems like it could increase the risk of harm to members of the community (by not taking sufficient action with the justification of perceived impact), while being more unfair to people who are accused of wrongdoing. To be clear, I'm basing this off the forum post, not any non-public information

Additionally, a common theme about basically every sexual violence scandal that I've read about is that there were (often multiple) warnings beforehand that were not taken seriously.

If there is a major sexual violence scandal in EA in the future, it will be pretty damning if the warnings and concerns were clearly raised, but the community health team chose not to act because they decided it wasn't worth the tradeoff against the person/people's impact.

Another point is that people who are considered impactful are likely to be somewhat correlated with people who have gained respect and power in the EA space, have seniority or leadership roles etc. Given the role that abuse of power plays in sexual violence, we should be especially cautious of considerations that might indirectly favour those who have power.

More weakly, even if you hold the view that it is in fact the community health team's role to "take the talent bottleneck seriously; don’t hamper hiring / projects too much" when responding to say, a sexual violence allegation, it seems like it would be easy to overvalue the bad-ness of the immediate action against the person's impact, and undervalue the bad-ness of many more people opting to not get involved, or distance themselves from the EA movement because they perceive it to be an unsafe place for women, with unreliable ways of holding perpetrators accountable.

That being said, I think the community health team has an incredibly difficult job, and while they play an important role in mediating community norms and dynamics (and thus have corresponding amount of responsibility), it's always easier to make comments of a critical nature than to make the difficult decisions they have to make. I'm grateful they exist, and don't want my comment to come across like an attack of the community health team or its individuals!

(commenting in personal capacity etc)

  1. ^

    used as an umbrella term to include things like verbal harassment. See definition here.


If this comment is more about "how could this have been foreseen", then this comment thread may be relevant. I should note that hindsight bias means that it's much easier to look back and assess problems as obvious and predictable ex post, when powerful investment firms and individuals who also had skin in the game also missed this. 

1) There were entries that were relevant (this one also touches on it briefly)
2) They were specifically mentioned
3) There were comments relevant to this. (notably one of these was apparently deleted because it received a lot of downvotes when initially posted)
4) There has been at least two other posts on the forum prior to the contest that engaged with this specifically

My tentative take is that these issues were in fact identified by various members of the community, but there isn't a good way of turning identified issues into constructive actions - the status quo is we just have to trust that organisations have good systems in place for this, and that EA leaders are sufficiently careful and willing to make changes or consider them seriously, such that all the community needs to do is "raise the issue". And I think looking at the systems within the relevant EA orgs or leadership is what investigations or accountability questions going forward should focus on - all individuals are fallible, and we should be looking at how we can build systems in place such that the community doesn't have to just trust that people who have power and who are steering the EA movement will get it right, and that there are ways for the community to hold them accountable to their ideals or stated goals if it appears to, or risks not playing out in practice.

i.e. if there are good processes and systems in place and documentation of these processes and decisions, it's more acceptable (because other organisations that probably have a very good due diligence process also missed it). But if there weren't good processes, or if these decisions weren't a careful + intentional decision, then that's comparatively more concerning, especially in context of specific criticisms that have been raised,[1]  or previous precedent. For example, I'd be especially curious about the events surrounding Ben Delo,[2] and processes that were implemented in response. I'd be curious about whether there are people in EA orgs involved in steering who keep track of potential risks and early warning signs to the EA movement, in the same way the EA community advocates for in the case of pandemics, AI, or even general ways of finding opportunities for impact. For example, SBF, who is listed as a EtG success story on 80k hours, has publicly stated he's willing to go 5x over the Kelly bet, and described yield farming in a way that Matt Levine interpreted as a Ponzi. Again, I'm personally less interested in the object level decision (e.g. whether or not we agree with SBF's Kelly bet comments as serious, or whether Levine's interpretation as appropriate), but more about what the process was, how this was considered at the time with the information they had etc. I'd also be curious about the documentation of any SBF related concerns that were raised by the community, if any, and how these concerns were managed and considered (as opposed to critiquing the final outcome).

Outside of due diligence and ways to facilitate whistleblowers, decision-making processes around the steering of the EA movement is crucial as well. When decisions are made by orgs that bring clear benefits to one part of the EA community while bringing clear risks that are shared across wider parts of the EA community,[3] it would probably be of value to look at how these decisions were made and what tradeoffs were considered at the time of the decision. Going forward, thinking about how to either diversify those risks, or make decision-making more inclusive of a wider range stakeholders[4], keeping in mind the best interests of the EA movement as a whole.

(this is something I'm considering working on in a personal capacity along with the OP of this post, as well as some others - details to come, but feel free to DM me if you have any thoughts on this. It appears that CEA is also already considering this)

If this comment is about "are these red-teaming contests in fact valuable for the money and time put into it, if it misses problems like this"

I think my view here (speaking only for the red-teaming contest) is that even if this specific contest was framed in a way that it missed these classes of issues, the value of the very top submissions[5] may still have made the efforts worthwhile. The potential value of a different framing was mentioned by another panelist. If it's the case that red-teaming contests are systematically missing this class of issues regardless of framing, then I agree that would be pretty useful to know, but I don't have a good sense of how we would try to investigate this.


  1. ^

    This tweet seems to have aged particularly well. Despite supportive comments from high-profile EAs on the original forum post, the author seemed disappointed that nothing came of it in that direction. Again, without getting into the object level discussion of the claims of the original paper, it's still worth asking questions around the processes. If there was were actions planned, what did these look like? If not, was that because of a disagreement over the suggested changes, or the extent that it was an issue at all? How were these decisions made, and what was considered?

  2. ^

    Apparently a previous EA-aligned billionaire ?donor who got rich by starting a crypto trading firm, who pleaded guilty to violating the bank secrecy act

  3. ^

    Even before this, I had heard from a primary source in a major mainstream global health organisation that there were staff who wanted to distance themselves from EA because of misunderstandings around longtermism.

  4. ^

    This doesn't have to be a lengthy deliberative consensus-building project, but it should at least include internal comms across different EA stakeholders to allow discussions of risks and potential mitigation strategies.

  5. ^

As requested, here are some submissions that I think are worth highlighting, or considered awarding but ultimately did not make the final cut. (This list is non-exhaustive, and should be taken more lightly than the Honorable mentions, because by definition these posts are less strongly endorsed  by those who judged it. Also commenting in personal capacity, not on behalf of other panelists, etc):

Bad Omens in Current Community Building
I think this was a good-faith description of some potential / existing issues that are important for community builders and the EA community, written by someone who "did not become an EA" but chose to go to the effort of providing feedback with the intention of benefitting the EA community. While these problems are difficult to quantify, they seem important if true, and pretty plausible based on my personal priors/limited experience. At the very least, this starts important conversations about how to approach community building that I hope will lead to positive changes, and a community that continues to strongly value truth-seeking and epistemic humility, which is personally one of the benefits I've valued most from engaging in the EA community.

Seven Questions for Existential Risk Studies
It's possible that the length and academic tone of this piece detracts from the reach it could have, and it (perhaps aptly) leaves me with more questions than answers, but I think the questions are important to reckon with, and this piece covers a lot of (important) ground. To quote a fellow (more eloquent) panelist, whose views I endorse: "Clearly written in good faith, and consistently even-handed and fair - almost to a fault. Very good analysis of epistemic dynamics in EA." On the other hand, this is likely less useful to those who are already very familiar with the ERS space.

Most problems fall within a 100x tractability range (under certain assumptions)
I was skeptical when I read this headline, and while I'm not yet convinced that 100x tractability range should be used as a general heuristic when thinking about tractability, I certainly updated in this direction, and I think this is a valuable post that may help guide cause prioritisation efforts.

The Effective Altruism movement is not above conflicts of interest
I was unsure about including this post, but I think this post highlights an important risk of the EA community receiving a significant share of its funding from a few sources, both for internal community epistemics/culture considerations as well as for external-facing and movement-building considerations. I don't agree with all of the object-level claims, but I think these issues are important to highlight and plausibly relevant outside of the specific case of SBF / crypto. That it wasn't already on the forum (afaict) also contributed to its inclusion here.

I'll also highlight one post that was awarded a prize, but I thought was particularly valuable:

Red Teaming CEA’s Community Building Work
I think this is particularly valuable because of the unique and difficult-to-replace position that CEA holds in the EA community, and as Max acknowledges, it benefits the EA community for important public organisations to be held accountable (and to a standard that is appropriate for their role and potential influence). Thus, even if listed problems aren't all fully on the mark, or are less relevant today than when the mistakes happened, a thorough analysis of these mistakes and an attempt at providing reasonable suggestions at least provides a baseline to which CEA can be held accountable for similar future mistakes, or help with assessing trends and patterns over time. I would personally be happy to see something like this on at least a semi-regular basis (though am unsure about exactly what time-frame would be most appropriate). On the other hand, it's important to acknowledge that this analysis is possible in large part because of CEA's commitment to transparency.

Congratulations on the pilot!

I just thought I'd flag some initial skepticism around the claim:

Our estimates indicate that next year, we will become 20 times as cost-effective as cash transfers.

Overall I expect it may be difficult for the uninformed reader to know how much they should update based on this post (if at all), but given you have acknowledged many of these (fairly glaring) design/study limitations in the text itself, I am somewhat surprised the team is still willing to make the extrapolation from 7x to 20x GD within a year. It also requires that the team is successful with increasing effective outreach by 2 OOMs despite currently having less than 6 months of runway for the organisation.[1] 

I also think this pilot should not give the team "a reasonable level of confidence that [the] adaptation of Step-by-Step was effective" insofar as the claim is that charitable dollars here are cost competitive with top GiveWell charities / have good reason to believe you will be 2x top GiveWell charities next year) (though perhaps you just meant from an implementation perspective, not cost-effectiveness). My current view is that while this might be a reasonable place to consider funding for non-EA funders (or e.g. specifically interested in mental health or mental health in India), I'd hope that the EA community who are looking to maximise impact through their donations in the GHD space would update based on higher evidentiary standards than what has been provided in this post, which IMO indicates little beyond feasibility and acceptability (which is still promising and exciting news, and I don't want to diminsh this!)

I don't want this to come across as a rebuke of the work the team is trying to do - I am on the record for being excited about more people doing work that use subjective wellbeing on the margin, and I think this is work worth doing. But I hope the team is mindful that continued overconfident claims in this space may cause people to negatively update and less likely to fund this work in future, and for totally preventable communication-related decisions, and not because wellbeing approaches are bad/not worth funding in principle.

  1. ^

    A very crude BOTEC based only on the increased time needed for the 15min / week calls with 10,000 people indicates something like 17 additional guides doing the 15min calls full time, assuming they do nothing but these calls every day. The increase in human resources to scale up to reaching 10,000 people are of course much more intensive than this, even for a heavily WhatsApp based intervention.

    10000 * 0.25 * 6 * 0.27 / 40 / 6 = 16.875
    (number reached * hours per week * weeks * retention / hours per week / week)

Hey Ben! A few quick Qs:

  1. Did the team consider a paid/minimum wage position instead of an unpaid one? How did it decide on the unpaid positions?
  2. Is the theory of change for impact here mainly an "upskill students/early career researchers" thing, or for the benefits to RP's research outputs?
  3. What is RP's current policy on volunteers?
  4. Does RP expect to continue recruiting volunteers for research projects in the future?

I think it is entirely possible that people are being unkind because they updated too quickly on claims from Ben's post that are now being disputed, and I'm grateful that you've written this (ditto chinscratch's comment) as a reminder to be empathetic. That being said, there are also some reasons people might be less charitable than you are for reasons that are unrelated to them being unkind, or the facts that are in contention:

I have only heard good things about Nonlinear, outside these accusations

Right now, on the basis of what could turn out to have been a lot of lies, their reputations, friendship futures and careers are at risk of being badly damaged

Without commenting on whether Ben's original post should have been approached better or worded differently or was misleading etc, this comment from the Community Health/Special Projects team might add some useful additional context. There are also previous allegations that have been raised.[1]

Perhaps you are including both of these as part of the same set of allegations, but some may suggest that not being permitted to run sessions / recruit at EAGs and considering blocking attendance (especially given the reference class of actions that have prompted various responses that you can see here) is qualitatively important and may affect whether commentors are being charitable or not (as opposed to if they just considered the contents of Ben's post VS Nonlinear (NL)'s response). Of course, this depends on how much you think the Community Health/Special Projects team are trustworthy with their judgement / investigation, or how likely this is all just an information cascade etc.

It seems reasonable to assume that the people at Nonlinear are altruistic people.

It is possible for altruistic people to be poor managers, poor leaders, make bad decisions about professional boundaries, have a poor understanding of power dynamics, or indeed, be abusive. The extent to which people at NL are altruistic is (afaict) not a major point of contention, and it is possible to not update about how altruistic someone is while also wanting to hold them accountable to some reasonable standard like "not being abusive or manipulative towards people you manage".

Instead, as I see it, the main, or at least most upvoted, response here has been to critique stylistic mistakes made in their almost impossible task of refuting very damaging claims from anonymous sources in unknown contexts. 

The claims in question from Alice/Chloe/Ben are not anonymous, the identities of Alice and Chloe are known to the Nonlinear team.

Independent of my personal views on these issues, I do think the pushback around 'stylistic mistakes' are reasonable insofar as people interpret this to be indicative of something concerning about NL's approach towards managing staff / criticism / conflict (1, 2, 3), rather than e.g. just being nitpicky about tone, though I appreciate both interpretations are plausible.


I'd like people to imagine what they would do in a similar situation if they were faced with similar accusations. How would you successfully persuade people that you didn't do the things you were accused of, and that the context was not as portrayed?

I think (much) less is more in this case.[2] I think there are parts of this current post that feel more subjective and not supported by facts, and may be reasonably interpreted by a cynical outsider to look like a distraction or a defensive smear campaign. I think these choices are counterproductive (both for a truth-seeking outsider, and for NL's own interests), especially given the allegations of frame control and being retaliatory. 

There are other parts that might similarly be reasonably interpreted to range from irrelevant (Alice's personal drug use habits), unproductive (links to Kathy Forth), or misleading (inclusion of photos, inconsistent usage of quotation marks, unnecessary paraphrasing, usage of quotes that miss the full context). I disagreed with the approaches here, though I acknowledge there were competing opinions and I wasn't privy to the internal discussions that lead to the decisions.

I think a cleaner version of this would have probably been something 5 to 10x shorter (not including the appendix), and looked something like:[3]

  • Apology for harms done
  • Acknowledgement of which allegations are seen as the most major (much closer to top 3-5 than all 85)
  • Responses to major allegations, focusing only on factual differences and claims that are backed up by ~irrefutable evidence
  • Charitable interpretations of Alice/Chloe/Ben's position, despite above factual disagreement (what kinds of things need to be true for their allegations to be plausibly reasonable or fair from their perspective),
  • Lessons learnt, and things NL will do differently in future (some expression of self-awareness / reflection)
  • An appendix containing a list of unresolved but less critical allegations

Disclaimer: I offered to (and did) help review an early draft, in large part because I expected the NL team to (understandably!) be in panic mode after Ben's post/getting dogpiled, and I wanted further community updates to be based on as much relevant information as was possible.

  1. ^

    This footnote added in response to Jeff's comment: I agree that it's likely not double counting, because the story there appears to be one where Kat left the working relationship, which is inconsistent with the accounts of Alice / Chloe's situations, but also makes it unlikely that the "current employee of NL / Kat" hypothesis is correct.

  2. ^

    Perhaps hypocritical given the length of this comment

  3. ^

    Acknowledging that I have no PR expertise

Can you assure me that Rethink's researchers are independent?.

I no longer work at RP, but I thought I'd add a data point from someone who doesn't stand to benefit from your donations, in case it was helpful.

I think my take here is that if my experience doing research with the GHD team is representative of RP's work going forwards, then research independence should not be a reason not to donate.[1] 

My personal impression is that of the work that I / the GHD team has been involved with, I have been afforded the freedom to look for our best guess of what the true answers are, and have personally never felt constrained or pushed into a particular answer that wasn't directly related to interpretation of the research. I have also consistently felt free to push back on lines of research that I feel would be less productive, or suggest stronger alternatives. I think credit here probably goes both to clients as well as the GHD team, though I'm not sure exactly how to attribute this.

I feel less confident about biases that may arise from the research agenda / selection of research questions or worldviews and assumptions of clients, but this could (for example) make one more inclined towards funding RP to do their own independent research, or specifying research you think is particularly important and neglected.

Edit: See thread by Saulius detailing his views.

  1. ^

    Caveats: I can't speak for the teams outside of GHD, and I can't speak for RP's work in 2024. This comment should not be seen as an endorsement of the claim that RP is the best place to donate to all things considered, which obviously is influenced by other variables beyond research independence.

Evidentiary standards. We drew on a large number of RCTs for our systematic reviews and meta-analyses of cash transfers and psychotherapy (42 and 74, respectively). If one holds that the evidence for something as well-studied as psychotherapy is too weak to justify any recommendations, charity evaluators could recommend very little.

A comparatively minor point, but it doesn't seem to me that the claims in Greg's post [more] are meaningfully weakened by whether or not psychotherapy is well-studied (as measured by how many RCTs HLI has found on it, noting that you already push back on some object level disagreement on study quality in point 1, which feels more directly relevant).

It also seems pretty unlikely to be true that psychotherapy being well studied necessarily means that StrongMinds is a cost-effective intervention comparable to current OP / GW funding bars (which is one main point of contention), or that charity evaluators need 74+ RCTs in an area before recommending a charity. Is the implicit claim being made here is that the evidence for StrongMinds being a top charity is stronger than that of AMF, which is (AFAIK) based on less than 74 RCTs?[1]

  1. ^

I never worked directly with Meghan when we were colleagues, but my interactions with her were v positive and give me the impression that she would be a great supervisor to work with - infectiously passionate about her research, an excellent communicator, and kind + supportive.

Load more