Bio

Participation
4

I currently work with CE/AIM-incubated charity ARMoR on research distillation, quantitative modelling and general org-boosting to support policy advocacy for market-shaping tools to incentivise innovation and ensure access to antibiotics to help combat AMR

I previously did AIM's Research Training Program, was supported by a FTX Future Fund regrant and later Open Philanthropy's affected grantees program, and before that I spent 6 years doing data analytics, business intelligence and knowledge + project management in various industries (airlines, e-commerce) and departments (commercial, marketing), after majoring in physics at UCLA and changing my mind about becoming a physicist. I've also initiated some local priorities research efforts, e.g. a charity evaluation initiative with the moonshot aim of reorienting my home country Malaysia's giving landscape towards effectiveness, albeit with mixed results. 

I first learned about effective altruism circa 2014 via A Modest Proposal, Scott Alexander's polemic on using dead children as units of currency to force readers to grapple with the opportunity costs of subpar resource allocation under triage. I have never stopped thinking about it since, although my relationship to it has changed quite a bit; I related to Tyler's personal story (which unsurprisingly also references A Modest Proposal as a life-changing polemic):

I thought my own story might be more relatable for friends with a history of devotion – unusual people who’ve found themselves dedicating their lives to a particular moral vision, whether it was (or is) Buddhism, Christianity, social justice, or climate activism. When these visions gobble up all other meaning in the life of their devotees, well, that sucks. I go through my own history of devotion to effective altruism. It’s the story of [wanting to help] turning into [needing to help] turning into [living to help] turning into [wanting to die] turning into [wanting to help again, because helping is part of a rich life].

How others can help me

I'm looking for "decision guidance"-type roles e.g. applied prioritization research.

How I can help others

Do reach out if you think any of the above piques your interest :)

Comments
195

Topic contributions
3

Here is a link to archived webpage captures of the article to bypass the paywall. 

On the more practical side, froolow's A critical review of GiveWell's 2022 cost-effectiveness model. GiveWell's CEA spreadsheets now are a lot better in many ways than back then, when they had the same kinds of model design and execution issues as the ones I used to see in my previous day job managing spreadsheet-based dashboards to track management metrics at a fast-growing company full of very bright inexperienced young analysts — this part resonated with my daily pain, as a relative 'non-genius' versus my peers (to borrow froolow's term):

It is fairly clear that the GiveWell team are not professional modellers, in the same way it would be obvious to a professional programmer that I am not a coder (this will be obvious as soon as you check the code in my Refactored model!). That is to say, there’s a lot of wasted effort in the GiveWell model which is typical when intelligent people are concentrating on making something functional rather than using slick technique. A very common manifestation of the ‘intelligent people thinking very hard about things’ school of model design is extremely cramped and confusing model architecture. This is because you have to be a straight up genius to try and design a model as complex as the GiveWell model without using modern model planning methods, and people at that level of genius don’t need crutches the rest of us rely on like clear and straightforward model layout. However, bad architecture is technical debt that you are eventually going to have to service on your model; when you hand it over to a new member of staff it takes longer to get that member of staff up to speed and increases the probability of someone making an error when they update the model.

Angelina Li's Level up your spreadsheeting (longer version: Level up your Google Sheets game) is great too, and much more granular. I would probably recommend their resource to most folks for spreadsheeting in general, and yours for CBAs more specifically.

On the "how to think about modelling better more broadly" side, Methods for improving uncertainty analysis in EA cost-effectiveness models, also by froolow, is one I think about often. I don't have a health economics background, so this argument shifted my perspective:

Uncertainty analysis is a major omission from most published EA models and seems to me like the proverbial ‘hundred dollar bill on the sidewalk’ – many of the core EA debates can be informed (and perhaps even resolved) by high-quality uncertainty analysis and I believe this could greatly improve the state of the art in EA funding decisions.

The goal of this essay is to change the EA community’s view about the minimal acceptable standard for uncertainty analysis in charity evaluation. To the extent that I use the GiveWell model as a platform to discuss broader issues of uncertainty analysis, a secondary goal of the essay is to suggest specific, actionable insights for GiveWell (and other EA cost-effectiveness modellers) as to how to use uncertainty analysis to improve their cost-effectiveness model.

This contributes to a larger strategic ambition I think EA should have, which is improving modelling capacity to the point where economic models can be used as reliable guides to action. Economic models are the most transparent and flexible framework we have invented for difficult decisions taken under resource constraint (and uncertainty), and in utilitarian frameworks a cost-effectiveness model is an argument in its own right (and debatably the only kind of argument that has real meaning in this framework). Despite this, EA appears much more bearish on the use of economic models than sister disciplines such as Health Economics. My conclusion in this piece is that there scope for a paradigm shift in EA modelling before which will improve decision-making around contentious issues.

This too, further down (this time emphasis mine): 

There is probably no single ‘most cost-effective use of philanthropic resources’. Instead, many people might have many different conceptions of the good which leads them to different conclusions even in a state of perfect knowledge about the effectiveness of interventions [1]. From reading the forums where these topics come up I don't think this is totally internalised - if it was totally internalised people would spend time discussing what would have to be true about morality to make their preferred EA cause the most cost-effective, rather than arguing that it is the actual best possible use of resources for all people [2].

Insofar as the GiveWell model is representative, it appears that resolving 'moral' disagreements (e.g. the discount rate) are likely to be higher impact than 'factual disagreements' (e.g. the effectiveness of malaria nets at preventing malaria). This is not unusual in my experience, but it does suggest that the EA community could do more to educate people around these significant moral judgements given that those moral judgements are more 'in play' than they are in Health Economics. Key uncertainties which drive model outputs include:

  • What should the discount rate for life-years and costs be? (And should it be the same for both?)
  • What is the ratio at which we would trade life-years for consumption-doublings?
  • How could we strengthen our assumptions about charity level adjustments?
  • How risk-averse should we be when donating to a charity with both upside and downside risk?

I do a lot of modelling in my job, and I have to say this is the best tacit knowledge piece I've read on modelling in a while (the MC gsheet template is a nice bonus too). Bookmarked for (I expect) frequent future reference. Thanks Richard. 

Awhile back John Wentworth wrote the related essay What Do GDP Growth Curves Really Mean?, where he pointed out that you wouldn't be able to tell that AI takeoff was boosting the economy just by looking at GDP growth data simply because of the way GDP is calculated (emphasis mine):

I sometimes hear arguments invoke the “god of straight lines”: historical real GDP growth has been incredibly smooth, for a long time, despite multiple huge shifts in technology and society. That’s pretty strong evidence that something is making that line very straight, and we should expect it to continue. In particular, I hear this given as an argument around AI takeoff - i.e. we should expect smooth/continuous progress rather than a sudden jump.

Personally, my inside view says a relatively sudden jump is much more likely, but I did consider this sort of outside-view argument to be a pretty strong piece of evidence in the other direction. Now, I think the smoothness of real GDP growth tells us basically-nothing about the smoothness of AI takeoff. Even after a hypothetical massive jump in AI, real GDP would still look smooth, because it would be calculated based on post-jump prices, and it seems pretty likely that there will be something which isn’t revolutionized by AI. At the very least, paintings by the old masters won’t be produced any more easily (though admittedly their prices could still drop pretty hard if there’s no humans around who want them any more). Whatever things don’t get much cheaper are the things which would dominate real GDP curves after a big AI jump.

More generally, the smoothness of real GDP curves does not actually mean that technology progresses smoothly. It just means that we’re constantly updating the calculations, in hindsight, to focus on whatever goods were not revolutionized. On the other hand, smooth real GDP curves do tell us something interesting: even after correcting for population growth, there’s been slow-but-steady growth in production of the goods which haven’t been revolutionized.

I do agree with your remark that

well-chosen economic indices might track “AI capabilities” in a sense more directly tied to the social and geopolitical implications of AI we actually care about for some purposes.[4] Badly chosen economic indices might not.

but for the GDP case I don't actually have any good alternative suggestions, and am curious if others do.

Curious if you happen to have written this up since?

I like this; feels like a more EA-flavored version of Gwern's My Ordinary Life: Improvements Since the 1990s

I'm thinking of all of his cost-effectiveness writings on this forum.

In 2011, GiveWell published the blog post Errors in DCP2 cost-effectiveness estimate for deworming, which made me lose a fair bit of confidence in DCP2 estimates (and by extension DCP3): 

we now believe that one of the key cost-effectiveness estimates for deworming is flawed, and contains several errors that overstate the cost-effectiveness of deworming by a factor of about 100. This finding has implications not just for deworming, but for cost-effectiveness analysis in general: we are now rethinking how we use published cost-effectiveness estimates for which the full calculations and methods are not public.

The cost-effectiveness estimate in question comes from the Disease Control Priorities in Developing Countries (DCP2), a major report funded by the Gates Foundation. This report provides an estimate of $3.41 per disability-adjusted life-year (DALY) for the cost-effectiveness of soil-transmitted-helminth (STH) treatment, implying that STH treatment is one of the most cost-effective interventions for global health. In investigating this figure, we have corresponded, over a period of months, with six scholars who had been directly or indirectly involved in the production of the estimate. Eventually, we were able to obtain the spreadsheet that was used to generate the $3.41/DALY estimate. That spreadsheet contains five separate errors that, when corrected, shift the estimated cost effectiveness of deworming from $3.41 to $326.43. We came to this conclusion a year after learning that the DCP2’s published cost-effectiveness estimate for schistosomiasis treatment – another kind of deworming – contained a crucial typo: the published figure was $3.36-$6.92 per DALY, but the correct figure is $336-$692 per DALY. (This figure appears, correctly, on page 46 of the DCP2.) ... 

I agree with their key takeaways, in particular (emphasis mine)

  • We’ve previously argued for a limited role for cost-effectiveness estimates; we now think that the appropriate role may be even more limited, at least for opaque estimates (e.g., estimates published without the details necessary for others to independently examine them) like the DCP2’s.
  • More generally, we see this case as a general argument for expecting transparency, rather than taking recommendations on trust – no matter how pedigreed the people making the recommendations. Note that the DCP2 was published by the Disease Control Priorities Project, a joint enterprise of The World Bank, the National Institutes of Health, the World Health Organization, and the Population Reference Bureau, which was funded primarily by a $3.5 million grant from the Gates Foundation. The DCP2 chapter on helminth infections, which contains the $3.41/DALY estimate, has 18 authors, including many of the world’s foremost experts on soil-transmitted helminths.

That said, my best guess is such spreadsheet errors probably don't change your bottomline finding that charity cost-effectiveness really does follow a power law — in fact I expect the worst cases to be actively harmful (e.g. PlayPump International), i.e. negative DALYs/$. My prior essentially comes from 80K's How much do solutions to social problems differ in their effectiveness? A collection of all the studies we could find, who find: 

There appears to be a surprising amount of consistency in the shape of the distributions.

The distributions also appear to be closer to lognormal than normal — i.e. they are heavy-tailed, in agreement with Berger’s findings. However, they may also be some other heavy-tailed distribution (such as a power law), since these are hard to distinguish statistically.

Interventions were rarely negative within health (and the miscellaneous datasets), but often negative within social and education interventions (10–20%) — though not enough to make the mean and median negative. When interventions were negative, they seemed to also be heavy-tailed in negative cost effectiveness.

One way to quantify the interventions’ spread is to look at the ratio of between the mean of the top 2.5% and the overall mean and median. Roughly, we can say:

  • The top 2.5% were around 20–200 times more cost effective than the median.
  • The top 2.5% were around 8–20 times more cost effective than the mean.

Overall, the patterns found by Ord in the DCP2 seem to hold to a surprising degree in the other areas where we’ve found data. 

Regarding your future work I'd like to see section, maybe Vasco's corpus of cost-effectiveness estimates would be a good starting point. His quantitative modelling spans nearly every category of EA interventions, his models are all methodologically aligned (since it's just him doing them), and they're all transparent too (unlike the DCP estimates). 

This writeup by Vadim Albinsky at Founders Pledge seems related: Are education interventions as cost effective as the top health interventions? Five separate lines of evidence for the income effects of better education [Founders Pledge] 

The part that seems relevant is the charity Imagine Worldwide's use of the "adaptive software" OneBillion app to teach numeracy and literacy. Despite Vadim's several discounts and general conservatism throughout his CEA he still gets ~11x GD cost-effectiveness. (I'd honestly thought, given the upvotes and engagement on the post, that Vadim had changed some EAs' minds on the promisingness of non-deworming education interventions.) The OneBillion app doesn't seem to use AI, but they already (paraphrasing) use "software to provide a complete, research-based curriculum that adapts to each child’s pace, progress, and cultural and linguistic context", so I'm not sure how much better Copilot / Rori would be?

Quoting some parts that stood out to me (emphasis mine):

This post argues that if we look at a broad enough evidence base for the long term outcomes of education interventions we can conclude that the best ones are as cost effective as top GiveWell grants. ... 

... I will argue that the combined evidence for the income impacts of interventions that boost test scores is much stronger than the evidence GiveWell has used to value the income effects of fighting malaria, deworming, or making vaccines, vitamin A, and iodine more available. Even after applying very conservative discounts to expected effect sizes to account for the applicability of the evidence to potential funding opportunities, we find the best education interventions to be in the same range of cost-effectiveness as GiveWell’s top charities. ...

When we apply the above recommendations to our median recommended education charity, Imagine Worldwide, we estimate that it is 11x as cost effective as GiveDirectly at boosting well-being through higher income. ...

Imagine Worldwide (IW) provides adaptive software to teach numeracy and literacy in Malawi, along with the training, tablets and solar panels required to run it. They plan to fund a six-year scale-up of their currently existing program to cover all 3.5 million children in grades 1-4 by 2028. The Malawi government will provide government employees to help with implementation for the first six years, and will take over the program after 2028. Children from over 250 schools have received instruction through the OneBillion app in Malawi over the past 8 years. Five randomized controlled trials of the program have found learning gains of an average of 0.33 standard deviations.  The OneBillion app has also undergone over five additional RCTs in a broad range of contexts with comparable or better results.

That's heartbreaking. Thanks for the pointer.

Load more