AI Governance Program Associate @ Open Philanthropy
1513 karmaJoined Working (0-5 years)


(Posting in a personal capacity unless stated otherwise.) I help allocate Open Phil's resources to improve the governance of AI with a focus on avoiding catastrophic outcomes. Formerly co-founder of the Cambridge Boston Alignment Initiative, which supports AI alignment/safety research and outreach programs at Harvard, MIT, and beyond, co-president of Harvard EA, Director of Governance Programs at the Harvard AI Safety Team and MIT AI Alignment, and occasional AI governance researcher. I'm also a proud GWWC pledger and vegan.


Yes, but it's kind of incoherent to talk about the dollar value of something without having a budget and an opportunity cost; it has to be your willingness-to-pay, not some dollar value in the abstract. Like, it's not the case that the EA funding community would pay $500B even for huge wins like malaria eradication, end to factory farming, robust AI alignment solution, etc, because it's impossible: we don't have $500B.

And I haven't thought about this much but it seems like we also wouldn't pay, say, $500M for a 1-in-1000 chance for a "$500B win" because unless you're defining "$500B win" with respect to your actual willingness-to-pay, you might wind up with many opportunities to take these kinds of moonshots and quickly run out of money. The dollar size of the win still has to ultimately account for your budget.

Well, it implies you could change the election with those amounts if you knew exactly how close the election would be in each state and spent optimally. But If you figure the estimates are off by an OOM, and half of your spending goes to states that turn out not to be useful (which matches a ~30 min analysis I did a few months ago), and you have significant diminishing returns such that $10M-$100M is 3x less impactful than the first $10M and $100M-$1B is another 10x less impactful, you still get:

  • First $10M is ~$10k per key vote = 1,000 votes (enough to swing 2000)
  • Next $90M is ~$30k per key vote = 3,000 votes
  • Next $900M is ~$90k per key vote = 10,000 votes

I think if you think there's a major difference between the candidates, you might put a value on the election in the billions -- let's say $10B for the sake of calculation; so the first $10M would be worth it if there's a 0.1% chance the election is decided by <1000 votes (which of course happened 6 elections ago!), the next $90M is worth it if there's a 0.9% chance the election is decided by >1000 but <4000 votes, and the next $900M is worth it if there's a 9% chance the election is decided by >4000 but <14000 votes. IMO the first two probably pass and the last one probably doesn't, but idk.

It seems like you might be under-weighing the cumulative amount of resources - even if you have some pretty heavy decay rate (which it's unclear you should -- usually we think of philanthropic investments compounding over time), avoiding nuclear war was a top global priority for decades, and it feels like we have a lot of intellectual and policy "legacy infrastructure" from that.

Yeah, this is all pretty compelling, thanks!


I think some of the AI safety policy community has over-indexed on the visual model of the "Overton Window" and under-indexed on alternatives like the "ratchet effect," "poisoning the well," "clown attacks," and other models where proposing radical changes can make you, your allies, and your ideas look unreasonable.

I'm not familiar with a lot of systematic empirical evidence on either side, but it seems to me like the more effective actors in the DC establishment overall are much more in the habit of looking for small wins that are both good in themselves and shrink the size of the ask for their ideal policy than of pushing for their ideal vision and then making concessions. Possibly an ideal ecosystem has both strategies, but it seems possible that at least some versions of "Overton Window-moving" strategies executed in practice have larger negative effects via associating their "side" with unreasonable-sounding ideas in the minds of very bandwidth-constrained policymakers, who strongly lean on signals of credibility and consensus when quickly evaluating policy options, than the positive effects of increasing the odds of ideal policy and improving the framing for non-ideal but pretty good policies.

In theory, the Overton Window model is just a description of what ideas are taken seriously, so it can indeed accommodate backfire effects where you argue for an idea "outside the window" and this actually makes the window narrower. But I think the visual imagery of "windows" actually struggles to accommodate this -- when was the last time you tried to open a window and accidentally closed it instead? -- and as a result, people who rely on this model are more likely to underrate these kinds of consequences.

Would be interested in empirical evidence on this question (ideally actual studies from psych, political science, sociology, econ, etc literatures, rather than specific case studies due to reference class tennis type issues).

Yes, some regulations backfire, and this is a good flag to keep in mind when designing policy, but to actually make the reference-class argument here work, you'd have to show that this is what we should expect from AI policy, which would include showing that failures like NEPA are either much more relevant for the AI case or more numerous than other, more successful regulations, like (in my opinion) the Clean Air Act, Sarbanes-Oxley, bans on CFCs or leaded gasoline, etc. I know it's not quite as simple as "I would simply design good regulations instead of bad ones," but it's also not as simple as "some regulations are really counterproductive, so you shouldn't advocate for any." Among other things, this assumes that nobody else will be pushing for really counterproductive regulations!

This post correctly identifies some of the major obstacles to governing AI, but ultimately makes an argument for "by default, governments will not regulate AI well," rather than the claim implied by its title, which is that advocating for (specific) AI regulations is net negative -- a type of fallacious conflation I recognize all too well from my own libertarian past.

Interesting! I actually wrote a piece on "the ethics of 'selling out'" in The Crimson almost 6 years ago (jeez) that was somewhat more explicit in its EA justification, and I'm curious what you make of those arguments.

I think randomly selected Harvard students (among those who have the option to do so) deciding to take high-paying jobs and donate double-digit percentages of their salary to places like GiveWell is very likely better for the world than the random-ish other things they might have done, and for that reason I strongly support this op-ed. But I think for undergrads who are really committed to doing the most good, there are two things I would recommend instead. Both route through developing a solid understanding of the most important and tractable problems in the world, via reading widely, asking good questions of knowledgeable people, doing their own writing and seeking feedback, probably aggressively networking among the people working on these problems. 

This enables much more effective earning to give — I think very plugged-in and reasonably informed donors can outperform even top grantmaking organizations in various ways, including helping organizations diversify their funding, moving faster, spotting opportunities that the grantmakers don't, etc. 

And it's also basically necessary for doing direct work on the world's most important problems. I think the generic advice to earn to give misses the huge variation in performance between individuals in direct work; if I understand correctly, 80k agrees with this and thinks this should have been much more emphasized in their early writing and advice. Many Harvard students, in my view, could relatively quickly become excellent in roles like think tank research in AI policy or biosecurity or operations at very impactful organizations. A smaller but nontrivial number could be excellent researchers on important philosophical or technical questions. I think it takes a lot of earning potential to beat those.

I object to calling funding two public defenders "strictly dominating" being one yourself; while public defender isn't an especially high-variance role with respect to performance compared to e.g. federal public policy, it doesn't seem that crazy that a really talented and dedicated public defender could be more impactful than the 2 or 3 marginal PDs they'd fund while earning to give.

The shape of my updates has been something like:

Q2 2023: Woah, looks like the AI Act might have a lot more stuff aimed at the future AI systems I'm most worried about than I thought! Making that go well now seems a lot more important than it did when it looked like it would mostly be focused on pre-foundation model AI. I hope this passes!

Q3 2023: As I learn more about this, it seems like a lot of the value is going to come from the implementation process, since it seems like the same text in the actual Act could wind up either specifically requiring things that could meaningfully reduce the risks or just imposing a lot of costs at a lot of points in the process without actually aiming at the most important parts, based on how the standard-setting orgs and member states operationalize it. But still, for that to happen at all it needs to pass and not have the general-purpose AI stuff removed.

November 2023: Oh no, France and Germany want to take out the stuff I was excited about in Q2. Maybe this will not be very impactful after all.

December 2023: Oh good, actually it seems like they've figured out a way to focus the costs France/Germany were worried about on the very most dangerous AIs and this will wind up being more like what I was hoping for pre-November, and now highly likely to pass!

Load more