EA as a Moloch Nobody Chose

Amrit (recovered acc.)

In 2022, two of Anthropic's co-founders, Daniela and Dario Amodei described the company as a small group leaving OpenAI in late 2020 and early 2021 during the pandemic, meeting masked in backyards, because they wanted to make what they called a "focused research bet" with a tightly aligned team. The bet was on building AI systems that are "helpful, honest, and harmless". On race dynamics, Dario was explicit that Anthropic should build models close to the state of the art (because you need frontier models to study frontier safety problems), but that they shouldn't race ahead or build models bigger than other orgs, and shouldn't ramp up excitement or hype about giant models.

Fast forward to today: Anthropic is a frontier for-profit company valued at $380 billion as of Feb 2026, is pursuing agentic coding on a technical trajectory that approaches autonomous recursive self-improvement, Dario has among the shortest AGI/TAI timelines of any major AI lab CEOs, their recently revised RSP has drawn criticism for dropping what many in the community interpreted as a commitment to pause development under certain risk thresholds, and now, what this piece is primarily about, the "~oversight" organisations it once called for will apparently soon be funded by Anthropic's own people. The founding rhetoric of humility hasn't quite changed, but the structural conditions around it have changed enormously.

Anthropic's own stated reason for the RSP revision, that unilateral restraint in a competitive environment cedes ground to less responsible actors, is basically a textbook articulation of the competitive trap Scott Alexander described in his Moloch essay. Since the RSP revision has been discussed in depth by others, I won't go into it here. I want to focus on something else: what happens when the philanthropy of an AI company's workforce becomes the primary funding source for the organisations meant to oversee that company.

A week ago, I came across the Transformer article on Anthropic's approaching philanthropic windfall. A tl;dr of the article by Gemini: Anthropic's co-founders have pledged 80% of their wealth to charity. A $6B tender offer and upcoming IPO are transitioning paper wealth into liquid capital. Organisations like Coefficient Giving are positioned to manage massive inflows focused on AI safety, animal welfare, and global health.

A note on framing:

What I'm describing is akin to (or straight up is) regulatory capture operating through philanthropic infrastructure with some Goodhartian-esque dynamics layered on top. I'm using the Moloch frame because this audience knows it and because the "emergent, nobody-chose-this" quality of the outcomes matters. But the canonical Moloch describes competitive multipolar traps, and what I'm describing involves a cooperative community drifting into a structural trap. That's a different category of Moloch I suppose. I'll use Moloch where the competitive angle genuinely applies (Anthropic does exist in a competitive industry that shapes everything downstream) and lean on regulatory capture where it doesn't apply.

The causal chain

Here's the skeleton of my argument. Decide for yourself whether each link holds.

EA's cause prioritisation framework identified AI safety as a top cause area. This attracted EA-aligned people into AI companies. Some of those companies, like Anthropic and to a good extent even OAI, became enormously valuable. This made EA-aligned AI employees wealthy. Their wealth is now flowing back into EA-aligned organisations through the donation infrastructure EA built. Those organisations include the ones evaluating and overseeing the AI companies that made the donors wealthy. The EA framework can ask "is this organisation doing cost-effective work?" It may struggle to ask "has my definition of cost-effective work been shaped by the financial interests of my donor base?" because that's a question about the framework itself, not one the framework is designed to answer.

No one designed this loop. But (and this is where the "nobody chose this" framing needs some qualification) donors do choose where to give, org leaders do choose what to research, grantmakers do choose whom to fund. These are legible decisions made by people with shared backgrounds, networks, and financial interests. I think calling the outcome "emergent" captures something real about how individual rationality aggregates into structural dysfunction, but it shouldn't mystify a process that sociologists and political scientists have written about.

The weakest link in this chain of mine then, I suppose, is the leap from "aggregated individual preferences" to "structural capture." A natural objection is: isn't this just a community funding what it believes in? When does funding-what-you-believe-in become “captured”? I believe that it gets captured at the point where the funding relationship compromises the recipient's capacity to produce findings that threaten the funder's interests regardless of whether that capacity is currently being exercised. A pharma company that funds the research institute evaluating its drugs has a structural problem even if the drugs happen to be good and the researchers happen to be honest. The problem is that the funding dependency erodes the capacity to say "this drug doesn't work," a capacity whose value is irrelevant to case-by-case needs. Similarly, if AIS orgs are financially dependent on Anthropic-sourced wealth, their capacity to say "Anthropic should be constrained" is structurally eroded even if Anthropic is currently behaving well. Thus, this is a concern about independence and institutional design, not about any individual's "goodness", beliefs, or integrity.

Selection mechanisms

Organisations that produce work compatible with Anthropic's continued operation (evaluations that feed into the RSP, alignment research that improves Anthropic's products, governance frameworks that legitimize Ant's approach) are positioned to receive that funding. organisations whose work implies Anthropic should slow down, stop, or submit to external authority (PauseAI, certain configurations of MIRI, maybe the likes of FLI) are unlikely to receive much. Nobody needs to make a phone call. The selection happens through aggregated donor preferences.

You could ask: how do you know Anthropic employees will form a coherent donor bloc? Ant employs thousands of people with diverse views. Pretty sure some are internally critical of the RSP revision. Some may even fund PauseAI-adjacent work. Pharma company employees donate to pharma-critical organisations all the time! Fair point. The selection mechanism I'm describing requires sufficient convergence, not unanimity. My claim is that the professional context (shared information environment, shared social network, shared financial interest in Anthropic's success, shared advisory infrastructure) will produce enough convergence in aggregate to meaningfully shape the funding landscape. Whether it actually does is testable, and I'll offer some predictions.

Counterforces

Before going further, I'll give some weight to what pushes against this concern.

Government funding for AI safety is growing: UK AISI, US AISI, EU AI Act enforcement bodies. Non-EA philanthropists like the Templeton Foundation, Hewlett Foundation, MacArthur Foundation, Alfred P. Sloan Foundation, etc. have begun funding AI safety and governance work. If these alternative sources scale fast enough, the funding concentration I'm worried about may not materialize, or may be substantially diluted. This is a counterforce that could dissolve the strongest version of my concern/argument.

Also, I don't know well enough what exact institutional safeguards currently exist within the EA funding infrastructure. Does CG have relevant COI policies? Funding diversification requirements? If robust safeguards exist, the mechanism by which funding concentration translates to epistemic capture may be significantly blocked. If anyone closer to this infrastructure can speak to what safeguards exist (or don't), I'd like to know! I'm raising the structural concern because I think it's important enough to warrant community attention. This is more "serious concern/ques that deserves investigation" than "diagnosis of existing capture."

Value drift and (perverse?) incentive structure

An employee who joins Anthropic and believes in AI safety will, over time, likely have their understanding of "AI safety" shaped by Anthropic's institutional perspective, through immersion rather than any deliberate indoctrination. Colleagues, friends, competitors, and funded organisations all reflect the same perspective back. From the inside, this feels like learning. From the outside, it could be environmentally induced narrowing. I'd wager that these are pretty hard to distinguish.

All this then interacts with the donation infrastructure. The more faithfully an employee follows EA-recommended giving pathways, the more their giving may flow toward organisations whose work is compatible with their employer's interests because the advisory infrastructure has likely been shaped by people who share their professional context. And soon, this infrastructure's effectiveness at concentrating resources, which is normally its greatest virtue, becomes, in this specific case, a mechanism for concentrating oversight capacity in industry-compatible directions. When pharmaceutical companies fund continuing medical education, the content systematically skews toward their products because the funding relationship shapes what gets produced, and not necessarily because anyone is lying. When tobacco companies funded research, the funded research systematically delayed consensus on harm through emphasis and framing, not just outright falsification. The mechanism is funding dependency which is shaping outputs over time.

Where EA can't measure its own success

EA's optimisation function works brilliantly when outcomes are measurable: malaria nets distributed, lives saved, cost-per-outcome calculated. The optimisation has an… "external referent", it can be wrong and correct itself.

In AI x-risk (forget s-risks), the external referent is weaker. Of course there are near-term measurables (rates of deceptive outputs, robustness failures, autonomous action incidents, stuff along those lines), but the core concern of preventing catastrophic outcomes from transformative AI lacks tight feedback loops. Why? Well because the catastrophic outcomes we're trying to prevent haven't happened yet. The only available proxy for "good AIS x-risk work" is the judgment of people the community considers expert, in practice, people embedded in AI companies and EA-aligned research organisations. When the measure of "high-impact AI safety work" becomes the target for funding-seekers, it ceases to be a good measure. Goodhart's Law operating at the level of a cause area.

All this can be testable

I'm not claiming that Anthropic employees are hypocrites, that EA is a front, or that philanthropy is insincere. I'm claiming that sincerity is not a firewall against structural capture. Came up with some directional hypotheses, not clean tests below. They're better than nothing, and I'm putting them down so we can check:

If the structural concern is real, within five years we should observe: (1) Anthropic equity-derived funding grows to represent a dominant share (I'd say >40%) of AI safety nonprofit funding. I lack current baseline data and would welcome corrections, knowing the current funding landscape precisely would itself be useful. (2) organisations whose published work has recommended binding constraints on frontier AI development (not just evaluations that feed into voluntary frameworks) experience below-average funding growth relative to the overall AIS funding pool. (3) METR's budget grows more slowly than Anthropic-aligned evaluation organisations, unless it secures substantial government or non-industry funding. But METR could thrive or struggle for idiosyncratic reasons unrelated to the dynamics I'm describing. Any single prediction isn't decisive but patterns across multiple indicators would be.

Conversely, if five years from now AI safety funding is genuinely diverse with non-industry sources comprising more than 50% of total AIS funding and organisations are producing findings that result in binding constraints on Ant's operations, I'm wrong. I hope I am wrong and all that'll happen is more vaccines, better QALYs, and fewer farmed animals.

To me, in some ways, EA often presents itself as Scott’s "Elua", the rational coordination mechanism that channels resources toward what actually matters. For global health, animal suffering, and CBRN/GCRs, EA credibly is, IMO. In AI safety, the entanglement between the optimiser and the optimised-for creates structural tension. EA-aligned people build the AI, EA-aligned organisations write the evaluations, and EA-aligned donors fund the evaluators. But METR's independence and this very discussion are evidence the system isn't fully closed. At least not yet. Still, the entanglement at multiple levels of the feedback loop is concerning, and the independent orgs may face resource disadvantages.

The question we need to sit with is not "are we individually acting in good faith?" - of course we are - but "does our collective structure have the capacity to produce findings and fund organisations that could materially threaten the financial interests of our most powerful members?" If the answer is yes, the structure is healthy. If the answer is "it wouldn't need to, because our most powerful members are aligned with good outcomes", that's the answer a captured system would produce.

If I'm wrong, I've wasted your time with an overwrought analogy. If I'm even directionally right, the community best equipped to diagnose structural traps may be sitting inside one. Given the stakes, the second risk is worth the cost of the first.

44 Reactions

More posts like this

Comments12

Sorted by

New & upvoted

Click to highlight new comments since: Today at 12:22 PM

Charlie_GuthmannMar 2310

There was a question so simple that no honest person could refuse.

A child is drowning.

Do you help?

From this, a city sprouted.

In the beginning there were no buildings. There was a leap.

And then a plunge.

Some cold wet socks.

And a coughing child firmly on solid earth.

Those who witnessed firsthand saw how vast and strong the river was, and how many more children they could not save. Word spread and one leap became many. Small structures began to rise along the great riverbank. As more came they brought new ideas.

One day, someone decided to start counting. If you mapped out the expected distribution of drownings, you could triage. One jump could save two. Or a smaller leap might go further than another requiring more bravery. This counting was not a betrayal of the original question. It was the question taken seriously.

From this the towers grew. In the towers the modelers worked and lived. At first the towers were short and adjacent to the riverbank. The modelers invented new tools - nets, boats, buoys, weather systems, river maps. These were real. They were the question taken seriously. With the help of those at the riverbanks children were rescued at rates never dreamt over.

Many of the people in the towers had been at the river once — stringing ropes, placing floats. They understood that an hour spent modeling could save more children than a year at the water. This was provably true.

The people who maintained the ropes and ladders were still respected. They were thanked at ceremonies.

Nonetheless the success of the towers spurred more towers. Each new tower asked a bigger question. What about the children far downstream? The river turned into a huge delta. The tools available would be much better deployed there than at the cities adjacent rapids. And so on.

And each answer to each question was bigger than the last, and at some point the answer was really big and the bigness was the point. But this was not a perversion of the question, it was the question taken seriously.

And so the towers shot into the sky. Big questions require big models and big tools and big solutions. The towers debated the hard questions. Honestly, rigorously, sometimes for years. People changed their minds. Studies were revised. The city prided itself on this — it was, in fact, better at updating than anywhere else. The debates were real. It was just that the city's center of gravity never moved very far.

Some left. The city wished them well and did not study where they went.

To build these towers the modelers needed money. They recruited people of extreme wealth who were drawn by the very same question. These people were very generous and funded the towers, and the docks, and the nets, and the boats, and the medicine for the ear infections and anything else you could think of. This was all very real, and many lives were saved. The city had no elections, no recall votes, no formal process for anything. The billionaires simply funded the work, and the work followed the funding, and the funding followed the billionaires' interests, and the billionaires' interests followed from the models, which the billionaires had funded.

One day one of these philanthropists made a bet. The bet was large, and it failed. The bet’s rationale at least had the appearance of being built on the machinery of the city. Not everyone thought the bet served the city's purpose. But the reasoning was layered and the models were complex and it was genuinely hard to say whether the bet was a betrayal of the city's logic or its fullest expression. The loss was large enough that programs closed and people at the river were called home. The city was shaken. The city had meetings about it. The city discussed accountability. The city discussed reform.

Some towers altered their appearance, and the riverbank looked different too, it had to after all, with the new lack of funds. But overall the city was still the same. To meaningfully change the city, the city would have to have decided that it was in fact a city, governed by interests. The city could not admit this. The core premise of the city was that the counting was not politics but math.

And so the city continued. In time the bet was old history and the children kept getting saved and the towers continued off into the sky.

The city has no architect. Nobody designed it. A thousand people made a thousand kind decisions and the decisions accumulated into a shape, and the shape made more of itself. There is no one to confront. There is nothing wrong with any single part of it.

Amrit (recovered acc.)Mar 234

Whoa. Is this your parable? This is beautifully put and it captures something my essay only gestures at. My worry is narrower though I suppose: the city is about to get a lot richer and the new money is coming from people who built a specific tower. 'the city wished them well and did not study where they went.' will stay with me, thanks. That's the selection mechanism I'm worried about.

Charlie_GuthmannMar 243

Yes I wrote it (with some help from Claude), glad you enjoyed it!

You are right your specific worry/content of the post is narrower, and generally I think you have approximately the right sense for what is going on and didn't mean for the parable to be an exact fictional substitute for your post but just related. Also maybe I missed it but I think you forgot to mention the selection effect of those who ends up at anthropic itself, which is arguably bigger than the value drift inside of it - the tower was to some extent built by believers!. It's always hard for me to write these comments because I feel as though I could write a 40 page book about the group dynamics in EA :P.

I really do love this community but I basically have given up on it (in terms of my views of the long run trajectory, i still love the people and read what they write religiously). I think it's already too late and it's been institutionally/culturally captured. I'm never certain of course but I think probably FTX was EAs last chance to put in real political/financial policies (some of which you mentioned/gestured at) that stop it from value drifting with outer status/money loops, and at this point it's most likely a waste of time to try to fix it. I didn't realize it at the time but this is how I feel looking back. I mean god damn we didn't even clear house of the majority of the people directly implicated in the scandal! That would have been a bare minimum, I think.

The problems are real but increasingly my advice is: you are better of hopping to something like humanism and working on improving it if you want to see the solutions implemented. The forum and EA movement at large if you don't live in a group house or for a prestigious EA org or have a bunch of money is basically when your older brother hands you an unplugged controller. I read the forum almost every day and have done so for years so one starts to pick up on some patterns. Since FTX I see a post like yours approx once a month (although more recently). They usually get between 20-50 upvotes so there is definitely some sort of coalition there but it's small and basically never does someone powerful in the movement interact with these posts, you can decide if that's a coincidence or not. And ultimately the posts always seem drift away with the wind. In this sense one can see how the movement would get accused of being paid opposition or something like that.

Personally, I will be hopping ship the first chance I get (i.e. as soon as another community has a close level of intellectual rigor without the horrible incentives and incoherent structure). And yes I will still call myself an effective altruist :), only if asked will I clarify the lowercaseness of that statement. (see I always write way too much - I'm working on it lol).

Ian TurnerMar 244

Let us know what you find.

VeryJerryMar 281

Have you seen the moral ambition folks?

Amrit (recovered acc.)Mar 241

Oh wow, the selection effect at hiring is an incredibly solid point I missed. The convergence I'm worried about downstream is indeed partly initiated at the door.

On the rest, the unplugged controller bit is... intriguing. I'm not where you are yet on giving up but I respect that you've been watching this pattern longer than I have. If you're right that posts like this just drift away, I guess we'll find out. I'd rather write it and be wrong about its impact than not write it and be right about the concern. Appreciate you sharing all this very much.

MichaelDickensMar 237

I don't think Anthropic is Molochian. I imagine Moloch as a situation where everyone knows it's bad, but nobody can do anything about it, even though everyone collectively could make it stop if they could coordinate. Anthropic isn't like that because they aren't being forced to make things worse; they're making an active and unforced decision to make things worse. The whole "we're trapped by the incentives!" thing is an extremely lame excuse that we should not let them get away with.

Charlie_GuthmannMar 249

Anthropic itself isn't Molochian - The molochian-ness (idk if I'm using this term right I don't read SSC) is that any time there is a disagreement in the community over some issue, and one side aligns more with the real worlds outer loops (money, status, intellectual sexiness), that side will naturally acquire more power within the movement because the movement does not have any way to counteract this other than persuasion which is increasingly difficult as the problems we deal with get more abstract and complex.

MichaelDickensMar 243

Yeah that sounds reasonable to me

Amrit (recovered acc.)Mar 245

Thanks for thinking about all this! You're right about Anthropic in that sense. I do state that canonical Moloch doesn't quite fit here. But my concern is a lil downstream from Ant choosing/trapped. I don't see that as being particularly consequential. My concern is about what happens when their employees' wealth becomes the dominant funding source for the organizations meant to oversee them. That dynamic operates the same way whether we think Ant is trapped or a willing one, IMO.

MichaelDickensMar 234

The "measurable outcomes" thing is distinct from Moloch IMO, but I do think it's an important factor in why AI safety work has been less impactful than it could've been: people are spending too much effort on Streetlight Effect research (this is especially true of AI-company-funded safety research).

SummaryBotMar 233

Executive summary: The author argues, speculatively but seriously, that EA’s AI safety ecosystem may be drifting into a “Moloch-like” structural trap where wealth from EA-aligned AI companies funds the very organisations meant to evaluate them, risking a form of regulatory capture even without bad intent.

Key points:

The author proposes a causal chain where EA prioritisation of AI safety leads to talent entering AI firms, generating wealth that is then funneled back into EA organisations, including those overseeing those same firms.
The concern is that funding dependence can erode an organisation’s capacity to produce findings that threaten donor interests, even if no bias is consciously exercised.
The author suggests selection effects will favor organisations whose work is compatible with companies like Anthropic, without requiring explicit coordination.
They argue that “value drift” and shared professional context may gradually align donors’ and organisations’ views, making this convergence hard to detect from the inside.
The author claims AI safety lacks strong external feedback loops, so judgments of “impact” rely on insiders, making the field vulnerable to Goodhart-like dynamics.
They offer testable predictions, such as Anthropic-derived funding exceeding “>40%” of AI safety nonprofit funding and constraint-advocating organisations receiving relatively less funding, while noting counterforces like government and non-EA funding could offset the effect.

This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.