Mo Putera



CE Research Training Program graduate and research intern at ARMoR under the Global Impact Placements program, working on cost-benefit analyses to help combat AMR. Currently exploring roles involving research distillation and quantitative analysis to improve decision-making e.g. applied prioritization research, previously supported by a FTXFF regrant. Previously spent 6 years doing data analytics, business intelligence and knowledge + project management in various industries (airlines, e-commerce) and departments (commercial, marketing), after majoring in physics at UCLA. Also collaborating on a local charity evaluation initiative with the moonshot aim of reorienting Malaysia's giving landscape towards effectiveness. 

I first learned about effective altruism circa 2014 via A Modest Proposal, a polemic on using dead children as units of currency to force readers to grapple with the opportunity costs of subpar resource allocation under triage. I have never stopped thinking about it since.


Topic contributions

something EA often misses from it's birds-eye approach to solutions - leverage.

I'd be curious as to what you mean here, since my impression was always that EA discourse heavily emphasises leverage – e.g. in the SPC framework for cause prioritisation, in career advice by 80,000 Hours and Probably Good, in GiveWell's reasoning (for instance here is how GW's spreadsheet adjusts for leverage in evaluating AMF). 

In Tom's report it's an open question

  • To inform the size of the effective FLOP gap
    • ...
    • What is the current $ value-add of AI? How is it changes over time, or with model size?
      • Various ways of operationalising this: investment, revenues, effect on GDP.
      • Relevant for when AI will first be capable enough to readily add $trillions / year to GDP.

The closest the report gets to answering your question seems to be in the Evidence about the size of the effective FLOP gap subsection, where he says (I put footnotes in square brackets)

  • As of today the largest training run is ~3e24 FLOP. [I believe these were the requirements for PaLM.] ...
  • In my opinion, today’s AI systems are not close to being able to readily perform 20% of all cognitive tasks done by human workers. [Actually automating these tasks would add ~$10tr/year to GDP.]
  • If today’s systems could readily add $500b/year to the economy, that would correspond to automating ~1% of cognitive tasks. [World GDP is ~$100tr, about half of which is paid to human labour. If AI automates 1% of that work, that’s worth ~$500b/year.]

That last assumption bullet is what seems to have gone into the model referenced in Vasco's answer.

You may have also seen Sam Clarke's classification of AI x-risk sources, just sharing for others :) 

Wei Dai and Daniel Kokotajlo's older longlist might worth perusing too?

As someone predisposed to like modeling, the key takeaway I got from Justin Sandefur's Asterisk essay PEPFAR and the Costs of Cost-Benefit Analysis was this corrective reminder – emphasis mine, focusing on what changed my mind:

Second, economists were stuck in an austerity mindset, in which global health funding priorities were zero-sum: $300 for a course of HIV drugs means fewer bed nets to fight malaria. But these trade-offs rarely materialized. The total budget envelope for global public health in the 2000s was not fixed. PEPFAR raised new money. That money was probably not fungible across policy alternatives. Instead, the Bush White House was able to sell a dramatic increase in America’s foreign aid budget by demonstrating that several billion dollars could, realistically, halt an epidemic that was killing more people than any other disease in the world. 


A broader lesson here, perhaps, is about getting counterfactuals right. In comparative cost-effectiveness analysis, the counterfactual to AIDS treatment is the best possible alternative use of that money to save lives. In practice, the actual alternative might simply be the status quo, no PEPFAR, and a 0.1% reduction in the fiscal year 2004 federal budget. Economists are often pessimistic about the prospects of big additional spending, not out of any deep knowledge of the budgeting process, but because holding that variable fixed makes analyzing the problem more tractable. In reality, there are lots of free variables.

More detail:

Economists’ standard optimization framework is to start with a fixed budget and allocate money across competing alternatives. At a high-level, this is also how the global development community (specifically OECD donors) tends to operate: foreign aid commitments are made as a proportion of national income, entirely divorced from specific policy goals. PEPFAR started with the goal instead: Set it, persuade key players it can be done, and ask for the money to do it.

Bush didn’t think like an economist. He was apparently allergic to measuring foreign aid in terms of dollars spent. Instead, the White House would start with health targets and solve for a budget, not vice versa. ... Economists are trained to look for trade-offs. This is good intellectual discipline. Pursuing “Investment A” means forgoing “Investment B.” But in many real-world cases, it’s not at all obvious that the realistic alternative to big new spending proposals is similar levels of big new spending on some better program. The realistic counterfactual might be nothing at all.

In retrospect, it seems clear that economists were far too quick to accept the total foreign aid budget envelope as a fixed constraint. The size of that budget, as PEPFAR would demonstrate, was very much up for debate.

When Bush pitched $15 billion over five years in his State of the Union, he noted that $10 billion would be funded by money that had not yet been promised. And indeed, 2003 marked a clear breaking point in the history of American foreign aid. In real-dollar terms, aid spending had been essentially flat for half a century at around $20 billion a year. By the end of Bush’s presidency, between PEPFAR and massive contracts for Iraq reconstruction, that number hovered around $35 billion. And it has stayed there since. 

Compared to normal development spending, $15 billion may have sounded like a lot, but exactly one sentence after announcing that number in his State of the Union address, Bush pivoted to the case for invading Iraq, a war that would eventually cost America something in the region of $3 trillion — not to mention thousands of American and hundreds of thousands of Iraqi lives. Money was not a real constraint.

Tangentially, I suspect this sort of attitude (Iraq invasion notwithstanding) would naturally arise out of a definite optimism mindset (that essay by Dan Wang is incidentally a great read; his follow-up is more comprehensive and clearly argued, but I prefer the original for inspiration). It seems to me that Justin has this mindset as well, cf. his analogy to climate change in comparing economists' carbon taxes and cap-and-trade schemes vs progressive activists pushing for green tech investment to bend the cost curve. He concludes: 

You don’t have to give up on cost-effectiveness or utilitarianism altogether to recognize that these frameworks led economists astray on PEPFAR — and probably some other topics too. Economists got PEPFAR wrong analytically, not emotionally, and continue to make the same analytical mistakes in numerous domains. Contrary to the tenets of the simple, static, comparative cost-effectiveness analysis, cost curves can sometimes be bent, some interventions scale more easily than others, and real-world evidence of feasibility and efficacy can sometimes render budget constraints extremely malleable. Over 20 years later, with $100 billion dollars appropriated under both Democratic and Republican administrations, and millions of lives saved, it’s hard to argue a different foreign aid program would’ve garnered more support, scaled so effectively, and done more good. It’s not that trade-offs don’t exist. We just got the counterfactual wrong.

Aside from his climate change example above, I'd be curious to know what other domains economists are making analytical mistakes in w.r.t. cost-benefit modeling, since I'm probably predisposed to making the same kinds of mistakes. 

One of the more surprising things I learned from Karen Levy's 80K podcast interview on misaligned incentives in global development was how her experience directly contradicted a stereotype I had about for-profits vs nonprofits: 

Karen Levy: When I did Y Combinator, I expected it to be a really competitive environment: here you are in the private sector and it’s all about competition. And I was blown away by the level of collaboration that existed in that community — and frankly, in comparison to the nonprofit world, which can be competitive. People compete for funding, and so very often we’re fighting over slices of the same pie. Whereas the Y Combinator model is like, “We’re making the pie bigger. It’s getting bigger for everybody.”

My assumption had been that the opposite was true. 

Tomasik's claim (emphasis mine)

I suspect many charities differ by at most ~10 to ~100 times, and within a given field, the multipliers are probably less than a factor of ~5.

reminded me of this (again emphasis mine) from Ben Todd's 80K article How much do solutions to social problems differ in their effectiveness? A collection of all the studies we could find

Overall, my guess is that, in an at least somewhat data-rich area, using data to identify the best interventions can perhaps boost your impact in the area by 3–10 times compared to picking randomly, depending on the quality of your data.

This is still a big boost, and hugely underappreciated by the world at large. However, it’s far less than I’ve heard some people in the effective altruism community claim.

In addition, there are downsides to being data-driven in this way — by insisting on a data-driven approach, you might be ruling out many of the interventions in the tail (which are often hard to measure, and so will be missing). This is why we advocate for first aiming to take a ‘hits-based’ approach, rather than a data-driven one.

"Hits-based rather than data-driven" is a pretty thought-provoking corrective to me, as I'm maybe biased by my background having worked in data-rich environments my whole career.

I thought I had mostly internalized the heavy-tailed worldview from a life-guiding perspective, but reading Ben Kuhn's searching for outliers made me realize I hadn't. So here are some summarized reminders for posterity:  

  • Key idea: lots of important things in life generated by multiplicative processes resulting in heavy-tailed distributions – jobs, employees / colleagues, ideas, romantic relationships, success in business / investing / philanthropy, how useful it is to try new activities  
  • Decision relevance to living better, i.e. what Ben thinks I should do differently:
    • Getting lots of samples improves outcomes a lot, so draw as many samples as possible
    • Trust the process and push through the demotivation of super-high failure rates (instead of taking them as evidence that the process is bad)
    • But don't just trust any process; it must have 2 parts: (1) a good way to tell if a candidate is an outlier ("maybe amazing" below) (2) a good way to draw samples 
    • Optimize less, draw samples more (for a certain type of person)
    • Filter for "maybe amazing", not "probably good", as they have different traits
    • Filter for "ruling in" candidates, not "ruling out" (e.g. in dating)
    • Cultivate an abundance mindset to help reject more candidates early on (to find 99.9th percentile not just 90th)
    • Think ahead about what outliers look like, to avoid accidentally rejecting 99.9th percentile candidates out of miscalibration, by asking others based on their experience 
  • My reservations with Ben's advice, despite thinking they're mostly sound and idea-generating:
    • "Stick with the process through super-high failure rates instead of taking them as evidence that the process is bad" feels uncomfortably close to protecting a belief from falsification
    • Filtering for "maybe amazing", not "probably good" makes me uncomfortable because I'm not risk-neutral (e.g. in RP's CCM I'm probably closest to "difference-making risk-weighted expected utility = low to moderate risk aversion", which for instance assesses RP's default AI risk misalignment megaproject as resulting in, not averting, 300+ DALYs per $1k)
    • Unlike Ben, I'm a relatively young person in a middle-income country, and the abundance mindset feels privileged (i.e. not as much runway to try and fail) 
  • So maybe a precursor / enabling activity for the "sample more" approach above is "more runway-building": money, leisure time, free attention & health, proximity to opportunities(?)

suppose that humanity is extinct (or reduced to a locked-in state) by 3000 CE (or any other period you choose); how likely is it that factor x figures in a causal chain leading to that?

Perhaps not a direct answer to your question, but this reminded me of the Metaculus Ragnarok series.

Just came across Max Dalton's 2014 writeup Estimating the cost-effectiveness of research into neglected diseases, part of Owen Cotton-Barratt's project on estimating cost-effectiveness of research and similar activities. Some things that stood out to me:

  • High-level takeaways
    • ~100x 95% CI range (mostly from estimates of total current funding to date, and difficulty of continuing with research), so figures below can't really argue for change in priorities so much as compel further research 
      • This uncertainty is a lower bound, including only statistical uncertainty and not model uncertainty 
    • Differing returns to research are largely driven by disease burden size, so look at diarrheal diseases, malaria, hookworm, ascariasis, trichuriasis, lymphatic filariasis, meningitis, typhoid, and salmonella – i.e. nothing too surprising 
  • Estimated figures: 
    • 13.9 DALYs/$1k for the sector as a whole (vs ~20 DALYs/$1k for GWWC top charities back in 2014), 95% CI 1.43-130 DALYs/$1k 
    • Median estimates: diarrheal disease e.g. cholera and dysentry 121 DALYs/$1k, salmonella infections 74 DALYs/$1k, worms ~50 DALYs/$1k, leprosy 0.058 DALYs/$1k
    • Most of the top diseases have ~100x 95% CI range, except salmonella whose range is ~3,000x(!) 
  • References

Thanks Joel, I think I agree.

I don't want to take up your time so feel free to not reply, just wondering out loud how Jeroen's method compares to the systematic cause mapping approach Michael Plant suggested in his thesis for generating new promising causes. I suppose the latter can be interpreted as a systematic way to implement Jeroen's method; for instance, starting from this table in Plant's thesis generating happiness intervention ideas... 

...Plant notes that many solutions apply to several primary causes (rows), inviting the idea of solution clustering (as illustrated below). I suppose Jeroen's "increasing cycling rates in cities instead of car usage" example would be what Plant calls a secondary cause, or whatever is more granular than secondary cause. Your longlist of causes seems relevant here too. 

(Aside: I'm not quite a fan of the 'primary vs secondary cause' naming, since the shared 'cause' name makes me think they're the same kind of thing when they're not – primary causes are problems, while secondary causes are solutions. 'Intervention area / cluster' would've been more illuminating I think.)

Load more