Feedback welcome: www.admonymous.co/mo-putera
I work with CE/AIM-incubated charity ARMoR on research distillation, quantitative modelling, consulting, MEL, and general org-boosting to support policies that incentivise innovation and ensure access to antibiotics to help combat AMR. I was previously an AIM Research Program fellow, was supported by a FTX Future Fund regrant and later Open Philanthropy's affected grantees program, and before that I spent 6 years doing data analytics, business intelligence and knowledge + project management in various industries (airlines, e-commerce) and departments (commercial, marketing), after majoring in physics at UCLA and changing my mind about becoming a physicist. I've also initiated some local priorities research efforts, e.g. a charity evaluation initiative with the moonshot aim of reorienting my home country Malaysia's giving landscape towards effectiveness, albeit with mixed results.
I first learned about effective altruism circa 2014 via A Modest Proposal, Scott Alexander's polemic on using dead children as units of currency to force readers to grapple with the opportunity costs of subpar resource allocation under triage. I have never stopped thinking about it since, although my relationship to it has changed quite a bit; I related to Tyler's personal story (which unsurprisingly also references A Modest Proposal as a life-changing polemic):
I thought my own story might be more relatable for friends with a history of devotion – unusual people who’ve found themselves dedicating their lives to a particular moral vision, whether it was (or is) Buddhism, Christianity, social justice, or climate activism. When these visions gobble up all other meaning in the life of their devotees, well, that sucks. I go through my own history of devotion to effective altruism. It’s the story of [wanting to help] turning into [needing to help] turning into [living to help] turning into [wanting to die] turning into [wanting to help again, because helping is part of a rich life].
By empirical evidence I meant anything empirical at all, including things like emergent misalignment and what might come out of Jacob Steinhardt's interpretability program and what Ryan Greenblatt says here and so on, not just observable behavior. Maybe I'm conflating things or overloading "empirical", in which case my apologies.
Regarding the sharp left turn, Byrnes' opinionated review is the best argument for worrying about this that I'm aware of, but he isn't talking about today's LLMs and their descendants, which rules out your last paragraph's pointer to current work. Roger Dearnaley's intuition pump behind his take that the sharp left turn might not be as hopeless as it seems is resonant with me, but his description seems vibes-based so I can't tell if he's misunderstanding the sharp left turn. I do think Dearnaley's personal "full-stack" attempt at assessing alignment progress is the sort of answer I'd want to your question re: what sort of work would be good evidence, although my impression is you disagree for high-level generator reasons that would be ~intractable to resolve within the margins of EA forum comments...
What do you think of efforts like Saffron Huang et al 2025? It's from a year ago as of this week so I'd guess Anthropic to have developed this line of work further since and integrated it into other workstreams and such.
AI assistants can impart value judgments that shape people's decisions and worldviews, yet little is known empirically about what values these systems rely on in practice. To address this, we develop a bottom-up, privacy-preserving method to extract the values (normative considerations stated or demonstrated in model responses) that Claude 3 and 3.5 models exhibit in hundreds of thousands of real-world interactions. We empirically discover and taxonomize 3,307 AI values and study how they vary by context. We find that Claude expresses many practical and epistemic values, and typically supports prosocial human values while resisting values like "moral nihilism". While some values appear consistently across contexts (e.g. "transparency"), many are more specialized and context-dependent, reflecting the diversity of human interlocutors and their varied contexts. For example, "harm prevention" emerges when Claude resists users, "historical accuracy" when responding to queries about controversial events, "healthy boundaries" when asked for relationship advice, and "human agency" in technology ethics discussions. By providing the first large-scale empirical mapping of AI values in deployment, our work creates a foundation for more grounded evaluation and design of values in AI systems.
The way the benefits calculation cashes out on an individual beneficiary basis essentially requires that they (mostly under-5s) live out full lives and enjoy 40 years of increased income, it isn't a function of how long the nets last.
I'm not sure this addresses Henry's critiques? In general, every bullet listed under "I think EA has punched above its weight in many ways with respect to making AI go well" is a proxy somewhere in the middle of the ToC chain while his comment is more end-of-ToC focused as he's skeptical of the proxies actually being beneficial, and none of these bullets address the counterfactuality he brought up. In particular, and for instance, you mentioned the founding of Redwood Research as an example of EA making AI go well despite Henry explicitly being skeptical of its impact so far:
AI Safety organisations like MIRI an Redwood Research have been operating for 25 and 5 years respectively. As an outsider I coudn't point to any particular breakthrough they've made in AI alignment. Redwood seems to do some kinda interesting work on measuring rogue behaviour and creating checks. I dunno. Seems like any organisation trying to make a reliable AI product would be heavily incentivised to do this stuff regardless.
To be clear I'm not taking sides or anything, I'm just disheartened by what I perceive to be a lot of talking past each other between AIS advocates and skeptics on this forum, some of which seem easily preventable, like in this case.
On the TV thing, here's an extended quote from Poor Economics in case it's of interest to others as well (emphasis mine):
The decision to spend money on things other than food may not be due entirely to social pressure. We asked Oucha Mbarbk, a man we met in a remote village in Morocco, what he would do if he had more money. He said he would buy more food. Then we asked him what he would do if he had even more money. He said he would buy better-tasting food. We were starting to feel very bad for him and his family, when we noticed a television, a parabolic antenna, and a DVD player in the room where we were sitting. We asked him why he had bought all these things if he felt the family did not have enough to eat. He laughed, and said, “Oh, but television is more important than food!”
After spending some time in that Moroccan village, it was easy to see why he thought that. Life can be quite boring in a village. There is no movie theater, no concert hall, no place to sit and watch interesting strangers go by. And not a lot of work, either. Oucha and two of his neighbors, who were with him during the interview, had worked about seventy days in agriculture and about thirty days in construction that year. For the rest of the year, they took care of their cattle and waited for jobs to materialize. This left plenty of time to watch television. These three men all lived in small houses without water or sanitation. They struggled to find work, and to give their children a good education. But they all had a television, a parabolic antenna, a DVD player, and a cell phone.
Generally, it is clear that things that make life less boring are a priority for the poor. This may be a television, or a little bit of something special to eat—or just a cup of sugary tea. Even Pak Solhin had a television, although it was not working when we visited him. Festivals may be seen in this light as well. ...
Orwell captured this phenomenon as well in The Road to Wigan Pier when he described how poor families managed to survive the depression.
Instead of raging against their destiny, they have made things tolerable by reducing their standards. But they don’t necessarily reduce their standards by cutting out luxuries and concentrating on necessities; more often it is the other way around—the more natural way, if you come to think of it—hence the fact that in a decade of unparalleled depression, the consumption of all cheap luxuries has increased.
These “indulgences” are not the impulsive purchases of people who are not thinking hard about what they are doing. They are carefully thought out, and reflect strong compulsions, whether internally driven or externally imposed. Oucha Mbarbk did not buy his TV on credit—he saved up over many months to scrape enough money together, just as the mother in India starts saving for her eight-year-old daughter’s wedding some ten years or more into the future, by buying a small piece of jewelry here and a stainless steel bucket there.
We are often inclined to see the world of the poor as a land of missed opportunities and to wonder why they don’t put these purchases on hold and invest in what would really make their lives better. The poor, on the other hand, may well be more skeptical about supposed opportunities and the possibility of any radical change in their lives. They often behave as if they think that any change that is significant enough to be worth sacrificing for will simply take too long. This could explain why they focus on the here and now, on living their lives as pleasantly as possible, celebrating when occasion demands it.
re: the latter, maybe you can get inspiration from RP's CCM > existential risk > "small-scale AI misalignment project" and check out the graphics below. Their default params are 96.4% chance no effect, 70% chance +ve outcome conditional on effect, +30% rise in p(extinction) conditional on -ve outcome, and you can change them and see how the EV updates; these defaults don't matter as much as the takeaway that AIS work needs to be robustly +ve and that folks whose risk aversion is greater than zero (probably wise) will do well to prioritise resolving this sign uncertainty, which boils down to Michael's advice above (cf. the advice to build deep models, or Dave Banerjee's advice more specifically).
Having followed a lot of AI benchmarks over the years, my main heuristic takeaway regarding expert-parity claims is "prepare to be disappointed once you dig in", alongside "but they were still useful in advancing understanding and progress", cf. SemiAnalysis' Benchmarks are bad but we need to keep using them anyways section for an outside-of-EA perspective. I'm also less bullish on long-range poor-feedback loops superforecasting more generally for reasons along the lines of superforecaster Eli Lifland's takes (esp. #2 and #4), Dan Luu's appendix notes and comparisons to the actually-accurate futurists his review found, nostalgebraist on metaculus badness, etc which collectively reduce my enthusiasm for automating this.