I'll be at the Impact Hub in Da Nang, Vietnam this February. Come say hi 🙂
I currently work with CE/AIM-incubated charity ARMoR on research distillation, quantitative modelling, consulting, and general org-boosting to support policies that incentivise innovation and ensure access to antibiotics to help combat AMR. I was previously an AIM Research Program fellow, was supported by a FTX Future Fund regrant and later Open Philanthropy's affected grantees program, and before that I spent 6 years doing data analytics, business intelligence and knowledge + project management in various industries (airlines, e-commerce) and departments (commercial, marketing), after majoring in physics at UCLA and changing my mind about becoming a physicist. I've also initiated some local priorities research efforts, e.g. a charity evaluation initiative with the moonshot aim of reorienting my home country Malaysia's giving landscape towards effectiveness, albeit with mixed results.
I first learned about effective altruism circa 2014 via A Modest Proposal, Scott Alexander's polemic on using dead children as units of currency to force readers to grapple with the opportunity costs of subpar resource allocation under triage. I have never stopped thinking about it since, although my relationship to it has changed quite a bit; I related to Tyler's personal story (which unsurprisingly also references A Modest Proposal as a life-changing polemic):
I thought my own story might be more relatable for friends with a history of devotion – unusual people who’ve found themselves dedicating their lives to a particular moral vision, whether it was (or is) Buddhism, Christianity, social justice, or climate activism. When these visions gobble up all other meaning in the life of their devotees, well, that sucks. I go through my own history of devotion to effective altruism. It’s the story of [wanting to help] turning into [needing to help] turning into [living to help] turning into [wanting to die] turning into [wanting to help again, because helping is part of a rich life].
This essay's examples and choice of emphasis made me uneasy, despite my wholehearted agreement with the title as stated and with most of the object-level advice ("be hungry, shake complacency, don't get caught up in short-term work incentives" etc). Some scattered reactions:
It feels a bit rude to link to an 80K essay given you work there, but I think of this piece as (maybe unintentionally) encouraging single-player thinking by valorizing individual heroism via choice of examples over the multiplayer mindset that doing good better together requires. Individual intensity doesn't seem to be the binding constraint to solving the world's biggest problems as much as trust / coordination / institutional quality are (alongside good judgment, more below). It's unfortunate that we have plenty of memetic galvanising anecdotes for the former over the latter, maybe "create more content to make multiplayer altruism sexy" should be a cause X, cf. your remark that there are too few stories of the Fred Hollows and Viktor Zhdanovs and they're much less famous than the Jensens and LBJs. It's also unfortunate that the traits mentioned in the anecdotes (Jensen being an asshole, LBJ being a lying manipulator) are memetically more fit than integrity / good character etc, as they're corrosive to the trust foundational to multiplayer altruism
If you buy that effectiveness = judgment x ambition x risk appetite and that the essay's motivating example is a central example, then good judgment arguably beats ambition, even more so on the margin given how undersupplied it is relative to ambition in EA, and doubly so for longtermist work (cf. Holden singling it out as an aptitude, OP struggling with sign uncertainty back then etc). You do mention this cf. misplaced ambition but I think it's a lot harder than "don't do a Jiro" suggests and should be more central to the thesis
Messaging-wise I worry that impressionable younger folks, like me a few years ago, might take away a simplistic maximising vibe from your examples despite all the nuance, which is perilous in a way that's not easy to deeply appreciate unless they've developed good judgment as to why it's perilous. I think this is especially the case with talented driven folks
Ultimately I don't think we disagree on much. Just a bummer that "cooperation-first character-shaped judgment-steered ambition" has no chance of catching on vs "be more ambitious"...
I was going to link to the 2011 GiveWell blog post by Holden Karnofsky arguing against taking EV estimates literally, but I see Alex Berger has already mentioned it above. I'd call out these passages in particular to save folks the effort of clicking through:
While some people feel that GiveWell puts too much emphasis on the measurable and quantifiable, there are others who go further than we do in quantification, and justify their giving (or other) decisions based on fully explicit expected-value formulas. The latter group tends to critique us – or at least disagree with us – based on our preference for strong evidence over high apparent “expected value,” and based on the heavy role of non-formalized intuition in our decisionmaking. This post is directed at the latter group.
We believe that people in this group are often making a fundamental mistake, one that we have long had intuitive objections to but have recently developed a more formal (though still fairly rough) critique of. The mistake (we believe) is estimating the “expected value” of a donation (or other action) based solely on a fully explicit, quantified formula, many of whose inputs are guesses or very rough estimates. We believe that any estimate along these lines needs to be adjusted using a “Bayesian prior”; that this adjustment can rarely be made (reasonably) using an explicit, formal calculation; and that most attempts to do the latter, even when they seem to be making very conservative downward adjustments to the expected value of an opportunity, are not making nearly large enough downward adjustments to be consistent with the proper Bayesian approach.
This view of ours illustrates why – while we seek to ground our recommendations in relevant facts, calculations and quantifications to the extent possible – every recommendation we make incorporates many different forms of evidence and involves a strong dose of intuition. And we generally prefer to give where we have strong evidence that donations can do a lot of good rather than where we have weak evidence that donations can do far more good – a preference that I believe is inconsistent with the approach of giving based on explicit expected-value formulas (at least those that (a) have significant room for error (b) do not incorporate Bayesian adjustments, which are very rare in these analyses and very difficult to do both formally and reasonably).
Sequence thinking involves making a decision based on a single model of the world: breaking down the decision into a set of key questions, taking one’s best guess on each question, and accepting the conclusion that is implied by the set of best guesses (an excellent example of this sort of thinking is Robin Hanson’s discussion of cryonics). It has the form: “A, and B, and C … and N; therefore X.” Sequence thinking has the advantage of making one’s assumptions and beliefs highly transparent, and as such it is often associated with finding ways to make counterintuitive comparisons.
Cluster thinking – generally the more common kind of thinking – involves approaching a decision from multiple perspectives (which might also be called “mental models”), observing which decision would be implied by each perspective, and weighing the perspectives in order to arrive at a final decision. Cluster thinking has the form: “Perspective 1 implies X; perspective 2 implies not-X; perspective 3 implies X; … therefore, weighing these different perspectives and taking into account how much uncertainty I have about each, X.” Each perspective might represent a relatively crude or limited pattern-match (e.g., “This plan seems similar to other plans that have had bad results”), or a highly complex model; the different perspectives are combined by weighing their conclusions against each other, rather than by constructing a single unified model that tries to account for all available information.
A key difference with “sequence thinking” is the handling of certainty/robustness (by which I mean the opposite of Knightian uncertainty) associated with each perspective. Perspectives associated with high uncertainty are in some sense “sandboxed” in cluster thinking: they are stopped from carrying strong weight in the final decision, even when such perspectives involve extreme claims (e.g., a low-certainty argument that “animal welfare is 100,000x as promising a cause as global poverty” receives no more weight than if it were an argument that “animal welfare is 10x as promising a cause as global poverty”).
Finally, cluster thinking is often (though not necessarily) associated with what I call “regression to normality”: the stranger and more unusual the action-relevant implications of a perspective, the higher the bar for taking it seriously (“extraordinary claims require extraordinary evidence”).
... I don’t believe that either style of thinking fully matches my best model of the “theoretically ideal” way to combine beliefs (more below); each can be seen as a more intellectually tractable approximation to this ideal.
I believe that each style of thinking has advantages relative to the other. I see sequence thinking as being highly useful for idea generation, brainstorming, reflection, and discussion, due to the way in which it makes assumptions explicit, allows extreme factors to carry extreme weight and generate surprising conclusions, and resists “regression to normality.” However, I see cluster thinking as superior in its tendency to reach good conclusions about which action (from a given set of options) should be taken.
... Sequence thinking presumes a particular framework for thinking about the consequences of one’s actions. It may incorporate many considerations, but all are translated into a single language, a single mental model, and in some sense a single “formula.” I believe this is at odds with how successful prediction systems operate, whether in finance, software, or domains such as political forecasting; such systems generally combine the predictions of multiple models in ways that purposefully avoid letting any one model (especially a low-certainty one) carry too much weight when it contradicts the others. On this point, I find Nate Silver’s discussion of his own system and the relationship to the work of Philip Tetlock (and the related concept of foxes vs. hedgehogs) germane
While the post is over a decade old it still seems foundational to how GiveWell think about their CEAs:
Cost-effectiveness is the single most important input in our evaluation of a program's impact. However, there are many limitations to cost-effectiveness estimates, and we do not assess programs solely based on their estimated cost-effectiveness.
I think of cluster thinking-based intervention ranking as better than the sequence thinking-plus-bayesian correction approach you explored above to account for the optimiser's curse for these reasons, especially the observation that successful prediction systems across most domains use cluster not sequence thinking.
Very interesting, thanks for writing it :) I had a brief chat with Opus 4.6 about your essay and it pointed out that the "robustness across maps" section is probably the most decision-relevant idea particularly under deep moral uncertainty, but also that the literature on robustness is less useful in practice than one might hope, working through cases in global health / AI safety / insect welfare / x-risk mitigation to illustrate. Opus concludes (mods, let me know if this longform quote is low-value AI slop and I'll remove it):
A "robustness across maps" strategy — favoring actions that look good under many theories — has a systematic directional bias. It favors interventions that are near-term, measurable, target existing beings, and operate through well-understood causal pathways. It disfavors interventions that are speculative, long-term, target merely possible beings, and depend on contested empirical models.
This is because the theories that assign high value to speculative, long-term interventions (total utilitarianism with low discount rates, for example) are precisely the theories that diverge most from other theories in their recommendations. An intervention can only be "robust" if theories with very different structures agree on it, and theories with very different structures are most likely to agree on cases where the action-relevant features (existing beings, measurable outcomes, clear causal pathways) are the ones all theories care about.
In other words: robustness-seeking is implicitly risk-averse in theory-space, and this risk aversion is not neutral — it systematically favors the neartermist portfolio. This isn't an argument against neartermism, but it is an argument that "robustness" isn't the neutral, above-the-fray methodology it's often presented as. It's a substantive position that deserves to be argued for on the merits rather than smuggled in as though it were mere prudence. ...
So what does the existing literature actually suggest you should do?
Honestly? The literature is in a state of productive but genuine confusion. MEC is the most formally developed approach but founders on normalization. The parliamentary model handles normalization differently but introduces bargaining-mechanism sensitivity. Robustness-seeking avoids both problems but has the hidden directional bias I described. "My Favourite Theory" (just go with whatever theory you assign highest credence) avoids aggregation problems entirely but seems to throw away valuable information from your uncertainty.
If I were being maximally honest about what the current state of the art justifies, I'd say something like this: the right approach is probably a hybrid where you use robustness as a first filter (if an action looks good across all plausible theories, just do it), MEC-style reasoning for decisions where robustness doesn't settle the question (accepting that normalization introduces some arbitrariness), and a precautionary overlay for irreversible decisions (where the option-value argument from the essay is actually doing real work — preserving future decision-space is one of the few principles that survives across most frameworks).
But I want to flag that this hybrid isn't a clean resolution — it's a pragmatic kludge that inherits problems from each component. The field needs either a breakthrough in inter-theoretic value comparison or a persuasive argument that the problem is fundamentally insoluble and we should accept some particular principled approximation. Neither has arrived yet.
A suggested tweak to the landscape metaphor: think about robustness as the set of directions that are uphill on most maps simultaneously, because it makes visually obvious that this set shrinks as you include more diverse maps, and it makes the directional bias visible — robust paths tend to point toward nearby, well-surveyed terrain rather than distant, poorly-mapped peaks.
Seems you and Spencer Greenberg (whose piece you linked to) are talking past each other because you both disagree on what the interesting epistemic question is and/or are just writing for different audiences?
Spencer is asking "When can a single observation justify a strong inference about a general claim?" which is about de-risking overgeneralisation, a fair thing to focus on since many people generalise too readily
You're asking "When does a single observation maximally reduce your uncertainty?" which is about information-theoretic value, which (like you said) is moreso aimed towards the "stats-brained"
Also seems a bit misleading to count something like "one afternoon in Vietnam" or "first day at a new job" as a single data point when it's hundreds of them bundled together? Spencer's examples seem to lean more towards actual single data points (if not all the way). And Spencer's 4th example on how one data point can sometimes unlock a whole bunch of other data points by triggering a figure-ground inversion that then causes a reconsideration of your vie seems perfectly aligned with Hubbard's point.
That said I do think the point you're making is the more practically useful one, I guess I'm just nitpicking.
re: "I think Open AI are reading too much into the data", to be perfectly honest I don't think they're reading into anything, I just interpreted it as marketing and hence dismissed it as evidence pertaining to AI progress. I'm not even being cynical, I've just worked in big corporate marketing departments for many years.
Yes definitely helpful, both for my own thinking and to be able to have something to point others to. With the caveat that learning from success stories requires some sort of survivorship bias adjustment, I think nuts-and-bolts writeups of technical policy reform success stories (as opposed to more high-level guides) are valuable and undersupplied, so if you ever get round to the more detailed writeup that would be great.
Strong-upvoted, thank you for the detailed writeup and BOTEC. I currently work in global health policy and your takeaways seem broadly right to me, I would include your reflections on the work itself in that section for folks who jump straight to that.
You handwaved away the internal 1.5 FTE for 6 months * overhead costs of conducting the work, but I'd be remiss not to mention that ~$100M in benefits from your Guesstimate vs order-of-mag $100k internal costs is a 1,000x ROI, within spitting distance of Open Phil's funding bar.
The most surprising thing I got from your writeup was (emphasis mine)
I looked for some statistics on how many rule change requests end up resulting in rule changes to get a sense of the success rate. 548 rule change requests have been initiated. It’s hard to say how many of them were successful without spending more time, because AEMC doesn’t publish statistics on this, and some of the requests get merged into others. Also, not all rule change requests survive the consultation process intact. It could end up being quite different to what was originally proposed. A little over half of rule changes that have been initiated have commenced (incorporated in the live rules). Overall, I’m reminded of a hits-based giving approach.
Over half! I would've ballparked this at 10% give or take, so it's good to reorient my gut-feel on hit rate to empirics.
I would indeed be keen for you to write more about what you specifically did during the project as per your offer, always good to have more case studies that go into the nuts and bolts for practitioners and folks wanting to test for fit in policy careers.
I think it's more so the latter. Scott Alexander's ACX Grants gives to a ton of systemic change-flavored stuff (see here), Charity Entrepreneurship / AIM has launched a fair number of orgs that aren't RCT-based direct delivery charities (policy, effective giving, evaluators, etc), etc to say nothing of longtermist and meta cause areas for which strict experimental control isn't possible at all.
I'm skeptical of the blindspot claim, e.g. there's a decade-old 80K article listing the wide variety of efforts by EAs working on systemic change even then.
This essay's examples and choice of emphasis made me uneasy, despite my wholehearted agreement with the title as stated and with most of the object-level advice ("be hungry, shake complacency, don't get caught up in short-term work incentives" etc). Some scattered reactions:
Ultimately I don't think we disagree on much. Just a bummer that "cooperation-first character-shaped judgment-steered ambition" has no chance of catching on vs "be more ambitious"...