PhD Student / Board Member @ MIT / EA Singapore
126Joined Mar 2021


AI alignment researcher in the Computational Cognitive Science and Probabilistic Computing Groups at MIT. My research sits at the intersection of AI and cognitive science, asking questions like: How can we specify and perform inference over rich yet structured generative models of human decision-making and value formation, in order to accurately infer human goals and values?

Currently a board member of EA Singapore, formerly co-president of Yale EA (2015-2017).


Welcome! To be clear, I do think that Buddhist thought and Kantian thought are more often at odds than in alignment. It's just that Garfield's more careful analysis of the No-Self argument suggests that accepting the emptiness of "Self" doesn't mean doing away with personhood-related concepts like moral responsibility.

That said, you might be interested in Dan Arnold's Brains, Buddhas and Believing, which does try to interpret arguments from the Madhyamaka school as similar to contemporary Kantian critiques against reductionism about the mind.

I really liked this post - appreciated how detailed and constructive it was! As one of the judges for the red-teaming contest, I personally thought this should have gotten a prize, and I think it's unfortunate that it didn't. I've tried to highlight it here in a comment on the announcement of contest winners!

Personal highlights from a non-consequentialist, left-leaning panelist
(Cross-posted from Twitter.)

Another judge for the criticism contest here - figured I would share some personal highlights from the contest as well!  I read much fewer submissions than the most active panelists (s/o to them for their hard work!), but given that I hold minority viewpoints in the context of EA  (non-consequentialist, leftist), I thought people might find these interesting.

I was initially pretty skeptical of the contest, and its ability to attract thoughtful foundational critiques. But now that the contest is over, I've been pleasantly surprised! 

To be clear, I still think there are important classes of critique missing. I would probably have framed the contest differently to encourage them, perhaps like what Michael Nielsen suggests here:

It would be amusing to have a second judging panel, of people strongly opposed to EA, and perhaps united by some other ideology. I wouldn't be surprised if they came to different conclusions.

I also basically agree with the critiques made in Zvi's criticism of the contest. All that said, below are some of my favorite (1) philosophical (2) ideological (3) object-level critiques.

(1) Philosophical Critiques

  • Population Ethics Without Axiology: A Framework
    Lukas Gloor's critique of axiological thinking was spot-on IMO. It gets at heart of why utilitarian EA/longtermism can lead to absurd conclusions, and how contractualist "minimal morality" addresses them. I think if people took Gloor's post seriously, it would strongly affect their views about what it means to "do good" in the first place: In order to "not be a jerk", one need not care about creating future happy people, whereas one probably should care about e.g. (global and intergenerational) justice.
  • On the Philosophical Foundations of EA
    I also liked this critique of several EA arguments for consequentialism by Will MacAskill and AFAIK shared by other influential EAs like Holdern Karnofsky and Nick Beckstead. Korsgaard's response to Parfit's argument (against person-affecting views) was new to me!
  • Deontology, the Paralysis Argument and altruistic longtermism
    Speaking of non-consequentialism, this one is more niche, but William D'Alessandro's refutation of Mogensen & MacAskill's "paralysis argument" that deontologists should be longtermists hit the spot IMO. The critique concludes that EAs / longtermists need to do better if they want to convince deontologists, which I very much agree with.

A few other philosophical critiques I've yet to fully read, but was still excited to see: 

(2) Ideological Critiques

I'm distinguishing these from the philosophical critiques, in that they are about EA as a lived practice and actually existing social movement. At least in my experience, the strongest disagreements with EA are generally ideological ones.

Unsurprisingly, there wasn't participation from the most vocal online critics! (Why make EA better if you think it should disappear?) But at least one piece did examine the "EA is too white, Western & male" and "EA is neocolonialist" critiques in depth: 

  • Red-teaming contest: demographics and power structures in EA
    The piece focuses on GiveWell and how it chooses "moral weights" as a case study. It then makes recommendations for democratizing ethical decision-making, power-sharing and increasing relevant geographic diversity.

    IMO this was a highly under-rated submission. It should have gotten a prize (at least $5k)! The piece doesn't say this itself, but it points toward a version of the EA movement that is majority non-white and non-Western, which I find both possible and desirable.

There was also a slew of critiques about the totalizing nature of EA as a lived practice (many of which were awarded prizes):

  • Effective altruism in the garden of ends
    I particularly liked this critique for being a first-person account from a (formerly) highly-involved EA about how such totalizing thinking can be really destructive.
  • Notes on Effective Altruism
    I also appreciated Michael Nielsen's critique, which discusses the aforementioned "EA misery trap", and also coins the term "EA judo" for how criticisms of EA are taken to merely improve EA, not discredit it.
  • Leaning into EA Disillusionment
    A related piece is about disillusionment with EA, and how to lean into it. I liked how it creates more space for sympathetic critics of EA with a lot of inside knowledge - including those of us who've never been especially "illusioned" in the first place!

That's it for the ideological critiques. This is the class of critique that felt the most lacking in my opinion. I personally would've liked more well-informed critiques from the Left, whether socialist or anarchist, on terms that EAs could appreciate. (Most such critiques I've seen are either no longer as relevant or feel too uncharitable to be constructive.)

There was one attempt to synthesize leftism and EA, but IMO not any better than this old piece by Joshua Kissel on "Effective Altruism and Anti-Capitalism". There have also been some fledgling anarchist critiques circulating online that I would love to see written up in more detail.

(And maybe stay tuned for The Political Limits of Effective Altruism, the pessimistic critique I've yet to write about the possibility of EA ever achieving what mass political movements achieve.)

(3) Object-Level Critiques

  • Biological Anchors External Review
    On AI risk, I'd be remiss not to highlight Jennifer Lin's review of the influential Biological Anchors report on AI timelines. I appreciated both the arguments against the neural network anchor, and the evolutionary anchor, and have become less convinced by the evolutionary anchor as a prediction for transformative AI by 2100.
  • A Critique of AI Takeover Scenarios
    I also appreciated James Fodor's critique of AI takeover scenarios put forth by influential EAs like Holden Karnofsky and Ajeya Cotra. I share the skepticism about the takeover stories I've seen so far, which have often seemed to me way too quick and subjective in their reasoning.
  • Are you really in a race? The Cautionary Tales of Szilárd and Ellsberg
    And of course, there's Haydn Belfield's cautionary tale about how nuclear researchers mistakenly thought they were in an arm's race, and how the same could happen (has happened?) with the race to "AGI". 
  • The most important climate change uncertainty
    Outside of AI risk, I was glad to see this piece on climate change get an honorable mention!  It dissects the disconnect between EA consensus and non-EAs about climate risk, and argues for more caution. (Disclosure: This was written by a friend, so I didn't vote on it.)
  • Red Teaming CEA’s Community Building Work
    Finally, I also appreciated this extensive critique of CEA's community-building work. I've yet to read it in full, but it resonates with challenges working with CEA I've witnessed while on the board of another EA organization.

There's of course tons more that I didn't get the chance to read. I wish I'd had the time! While the results of the contest of won't please everyone - much less the most trenchant EA critics - I still think the world is still better for it, and I'm now more optimistic about this particular contest format and incentive scheme than I was previously.

For whatever it's worth, it looks like Carrick himself has chosen to donate $2900 to the Salinas campaign, and to publicly announce it via his official Twitter account:

Today I donated the maximum amount, $2900, to #OR06's @AndreaRSalinas. I earned less than $45k last year, so my money is where my mouth is when I say that I believe she will do an excellent job representing Oregonians in DC. [1/2]

This is a tight race and we must win it not only to get Andrea into office but also to keep Congress blue. Please consider digging deep and donating to her campaign here: https://tinyurl.com/2p8m9nwh. And for those planning to help GOTV, I'm right here with you. [2/2]


I believe we should think in terms of marginal effectiveness rather than offsetting particular harms we (individually or as a community) cause (see the author's "you will have contributed in a small way to this failure" argument). If you want to offset harm that you have done, there's little reason to do so by donating to Salinas rather than doing good in a more effective manner.

I have no involvement in the Oregon race, but I disagree with this particular line of reasoning. Even setting aside traditional non-consequentialist arguments for compensating for harm (which I happen to believe in, and which I think are perfectly fine for EAs to act upon while still being EAs), this line of reasoning only works if one adopts causal decision theory.

If we instead adopt functional decision theory, then there are much stronger reasons to consistently act as a harm-compensating agent. In particular, it can disincentivize harmful strategic behavior by others who try to influence you by simulating what might do in the future. If you cannot be simulated to harm some party without compensating them later, then you cannot be influenced to do so by others. It also enables co-operation with others who can now trust you will compensate them for harm (necessary even for everyday economic interactions).

I think one could disagree as to whether FDT applies in this case (and also disagree with FDT in general), but I want to push back against the general argument that we should always be marginal thinkers, without consideration for the history of past events.

(S/O to particlemania for having first explained this argument to me. There's also an argument to be made that conventional morality evolved FDT-like characteristics precisely to solve these strategic problems, but I won't get into that here.)

Neat post! I wasn't previously aware of Korsgaard's argument against Parfit, but it strikes me as very resonant with a pragmatic (Madhyamaka) Buddhist response to the non-existence of the Self. As Jay Garfield writes in Freedom, Agency and Ethics for Madhyamikas about the possibility of moral responsibility without "free will":

For a Madhyamika, we have noted, our selves are constructed. They are constructed through the appropriation of aggregates, through recognizing a body as mine, thoughts as mine, values, dispositions, and intentions as mine. In turn, those physical and cognitive processes are also constructed in relation to that self, and it is appropriated by them. That appropriation and narration of a life is, moreover, not a solo affair. We narrate and construct each other constantly in the hermeneutical ensemble act that is social life. [...]

What is it to act? As we noted above, it is for our behavior to be determined by reasons, by motives we and/or others, regard as our own. On a Madhyamaka understanding, it is therefore for the causes of our behavior to be part of the narrative that makes sense of our lives, as opposed to being simply part of the vast uninterpreted milieu in which our lives are led, or bits of the narratives that more properly constitute the lives of others. This distinction is not a metaphysical but a literary distinction, and so a matter of choice, and sensitive to explanatory purposes. That sensitivity, on the other hand, means that the choice is not arbitrary. We can follow Nietzsche here. For what do we take responsibility and for what are we assigned responsibility? Those acts we interpret—or which others interpret for us—as our own, as constituting part of the basis of imputation of our own identities. [...]

From this perspective, a choice occurs when we experience competing motives, consider alternative reasons, some of which could, if dominant, occasion alternative actions, and one set of reasons dominates, causing the action, and caused to cause the action by our background psychological dispositions and other cognitive and conative states. Some actions are expressive of and conducive to virtue, happiness, liberation and the welfare of others and merit praise; others are not. But there need be no more to moral assessment than that. Everything that the post-Augustinian libertarian West buys with the gold coin of the freedom of the will along with all of the metaphysical problems it raises, are bought by the Madhyamika much more cheaply with the paper currency of mere imputation.

I thought I'd share this because Parfit's arguments against personal identity are often viewed as analogues to Buddhist metaphysical arguments against the Self, both of which are often taken to imply something like utilitarianism. (I used to believe something like this!)

But as the above passages highlight, anti-realism about Selves can co-exist with pragmatic fictionalism (what Madhyamaka Buddists call "conventional truth") about Selves, and so it still makes plenty of sense to talk about "persons", as long as we recognize that the concept of personhood (like all other concepts) are mere conventions (or so Madhyamaka Buddhists argue).

Also, regarding persuading non-consequentialists on their own terms, I've long been meaning to write a post (tentatively) titled "Judicious Duty: Effective Altruism for Non-Consquentialists", so this is giving me additional motivation to eventually do so :)

As someone who leans deontological these days (and contractualist in particular), I really appreciated this post! 

Honestly quite baffled by the original argument, and it definitely makes me less inclined towards longtermist philosophy and the thinking associated with it.  To me it's clear that identity-causing acts do not cause harm in a way that one is responsible for it, in the same way that unintentionally delaying a robbery does not cause harm in a way that one is responsible for it, so the paralysis argument feels extremely weird to me.

I think there are good arguments for doing a lot more than we currently do to prevent the foreseeable suffering of future people, but this is not one of those arguments, much less an argument for something like strong longtermism.

I felt this way reading the post as well "many of the most influential EA leaders" and "many EA leaders" and feels overly vague and implicitly normative. Perhaps as a constructive suggestion, we could attempt to list which leaders you mean?

Regarding 10% chance or greater of human extinction, here are the people I can think of who have expressed something like this view:

  • Toby Ord
  • Will MacAskill
  • 80k leadership
  • OpenPhil leadership

Regarding "primarily concerned with AI safety", it's not clear to me whether this is in contrast to the x-risk portfolio approach that most funders like OpenPhil and FTX and career advisors like 80k are nonetheless taking. If you mean something like "most concerned about AI safety" or "most prioritize AI safety", then this feels accurate of the above list of people.

To the extent possible, I think it'd be especially helpful to list the several people or institutions who believe in 50% chance of extinction, or who estimate AGI in 10 years vs 30 years vs 50 years, and what kind of influence they have.

Loved this post - reminds me a lot of intractability critiques of central economic planning, except now applied to consequentialism writ large.

I'd be curious if you think a weaker version of the "Prevent Possible Harms" principle would solve the issue - perhaps "Prevent Computably Possible Harms" and "Don't Prevent Computably Impossible Harms"? Seems possibly related to debates around normative externalism and the extent to which we need our beliefs to be "objective" to be justified.

I have read the paper, not the book! And have tried to get friends to read it, though unfortunately I don't think it was necessarily very effective either. I did end up writing an op-ed (Reparation, not just Charity) once trying to motivate wealthy students to redistribute more of their wealth, and it received a lot of likes on social media, but I'm not sure that it led to meaningful behavioral change :/ I think behavioral changes and commitments just take a lot more work, and a supportive community to encourage it. 

Load More