AI Governance Program Associate @ Open Philanthropy
1223 karmaJoined Jan 2022Working (0-5 years)



(Posting in a personal capacity unless stated otherwise.) I help allocate Open Phil's resources to improve the governance of AI with a focus on avoiding catastrophic outcomes. Formerly co-founder of the Cambridge Boston Alignment Initiative, which supports AI alignment/safety research and outreach programs at Harvard, MIT, and beyond, co-president of Harvard EA, Director of Governance Programs at the Harvard AI Safety Team and MIT AI Alignment, and occasional AI governance researcher. I'm also a proud GWWC pledger and vegan.


(I began working for OP on the AI governance team in June. I'm commenting in a personal capacity based on my own observations; other team members may disagree with me.)

OpenPhil sometimes uses its influence to put pressure on orgs to not do things that would disrupt the status quo

FWIW I really don’t think OP is in the business of preserving the status quo.  People who work on AI at OP have a range of opinions on just about every issue, but I don't think any of us feel good about the status quo! People (including non-grantees) often ask us for our thoughts about a proposed action, and we’ll share if we think some action might be counterproductive, but many things we’d consider “productive” look very different from “preserving the status quo.” For example, I would consider the CAIS statement to be pretty disruptive to the status quo and productive, and people at Open Phil were excited about it and spent a bunch of time finding additional people to sign it before it was published.

Lots of people want to work there; replaceability

I agree that OP has an easier time recruiting than many other orgs, though perhaps a harder time than frontier labs. But at risk of self-flattery, I think the people we've hired would generally be hard to replace — these roles require a fairly rare combination of traits. People who have them can be huge value-adds relative to the counterfactual!

pretty hard to steer OP from within

I basically disagree with this. There are areas where senior staff have strong takes, but they'll definitely engage with the views of junior staff, and they sometimes change their minds. Also, the AI world is changing fast, and as a result our strategy has been changing fast, and there are areas full of new terrain where a new hire could really shape our strategy. (This is one way in which grantmaker capacity is a serious bottleneck.)

Nitpick: I would be sad if people ruled themselves out for e.g. being "20th percentile conscientiousness" since in my impression the popular tests for OCEAN are very sensitive to what implicit reference class the test-taker is using. 

For example, I took one a year ago and got third percentile conscientiousness, which seems pretty unlikely to be true given my abilities to e.g. hold down a grantmaking job, get decent grades in grad school, successfully run 50-person retreats, etc. I think the explanation is basically that this is how I respond to "I am often late for my appointments": "Oh boy, so true. I really am often rushing to my office for meetings and often don't join until a minute or two after the hour." And I could instead be thinking, "Well, there are lots of people who just regularly completely miss appointments, don't pay bills, lose jobs, etc. It seems to me like I'm running late a lot, but I should be accounting for the vast diversity of human experience and answer 'somewhat disagree'." But the first thing is way easier; you kinda have to know about this issue with the test to do the second thing.

(Unless you wouldn't hire someone because they were only ~1.3 standard deviations more conscientious than I am, which is fair I guess!)

Reposting my LW comment here:

Just want to plug Josh Greene's great book Moral Tribes here (disclosure: he's my former boss). Moral Tribes basically makes the same argument in different/more words: we evolved moral instincts that usually serve us pretty well, and the tricky part is realizing when we're in a situation that requires us to pull out the heavy-duty philosophical machinery.

Huh, it really doesn't read that way to me. Both are pretty clear causal paths to "the policy and general coordination we get are better/worse as a result."

Most of these have the downside of not giving the accused the chance to respond and thereby giving the community the chance to evaluate both the criticism and the response (which as I wrote recently isn't necessarily a dominant consideration, but it is an upside of the public writeup).

Fwiw, seems like the positive performance is more censored in expectation than the negative performance: while a case that CH handled poorly could either be widely discussed or never heard about again, I'm struggling to think of how we'd all hear about a case that they handled well, since part of handling it well likely involves the thing not escalating into a big deal and respecting people's requests for anonymity and privacy.

It does seem like a big drawback that the accused don't know the details of the accusations, but it also seems like there are obvious tradeoffs here, and it would make sense for this to be very different from the criminal justice system given the difference in punishments (loss of professional and financial opportunities and social status vs. actual prison time).

Agreed that a survey seems really good.

Thanks for writing this up!

I hope to write a post about this at some point, but since you raise some of these arguments, I think the most important cruxes for a pause are:

  1. It seems like in many people's models, the reason the "snap back" is problematic is that the productivity of safety research is much higher when capabilities are close to the danger zone, both because the AIs that we're using to do safety research are better and because the AIs that we're doing the safety research on are more similar to the ones in the danger zone. If the "snap back" reduces the amount of calendar time during which we think AI safety research will be most productive in exchange for giving us more time overall, this could easily be net negative. On the other hand, a pause might just "snap back" to somewhere on the capabilities graph that's still outside the danger zone, and lower than it would've been without the pause for the reasons you describe.
  2. A huge empirical uncertainty I have is: how elastic is the long-term supply curve of compute? If, on one extreme end, the production of computing hardware for the next 20 years is set in stone, then at the end of the pause there would be a huge jump in how much compute a developer could use to train a model, which seems pretty likely to produce a destabilizing/costly jump. At the other end, if compute supply were very responsive to expected AI progress and a pause would mean a big cut to e.g. Nvidia's R&D budget and TSMC shelved plans for a leading-node fab or two as a result, the jump would be much less worrying in expectation. I've heard that the industry plans pretty far in advance because of how much time and money it takes to build a fab (and how much coordination is required between the different parts of the supply chain), but it seems like at this point a lot of the future expected revenue to be won from designing the next generations of GPUs comes from their usefulness for training huge AI systems, so it seems like there should at least be some marginal reduction in long-term capacity if there were a big regulatory response.

Agree, basically any policy job seems to start teaching you important stuff about institutional politics and process and the culture of the whole political system!

Though I should also add this important-seeming nuance I gathered from a pretty senior policy person who said basically: "I don't like the mindset of, get anywhere in the government and climb the ladder and wait for your time to save the day; people should be thinking of it as proactively learning as much as possible about their corner of the government-world, and ideally sharing that information with others."

Suggestion for how people go about developing this expertise from ~scratch, in a way that should be pretty adaptable to e.g. the context of an undergraduate or grad-level course, or independent research (a much better/stronger version of things I've done in the past, which involved lots of talking and take-developing but not a lot of detail and publication, which I think are both really important):

  1. Figure out who, both within the EA world and not, would know at least a fair amount about this topic -- maybe they just would be able to explain why it's useful in more context than you have, maybe they know what papers you should read or acronyms you should familiarize yourself with -- and talk to them, roughly in increasing order of scariness/value of their time, such that you've at least had a few conversations by the time you're talking to the scariest/highest-time-value people. Maybe this is like a list of 5-10 people?
  2. During these conversations, take note of what's confusing you, ideas that you have, connections you or your interlocutors draw between topics, takes you find yourself repeating, etc.; you're on the hunt for a first project.
  3. Use the "learning by writing" method and just try to write "what you think should happen" in this area, as in, a specific person (maybe a government agency, maybe a funder in EA) should take a specific action, with as much detail as you can, noting a bunch of ways it could go wrong and how you propose to overcome these obstacles.
  4. Treat this proposal as a hypothesis that you then test (meaning, you have some sense of what could convince you it's wrong), and you seek out tests for it, e.g. talking to more experts about it (or asking them to read your draft and give feedback), finding academic or non-academic literature that bears on the important cruxes, etc., and revise your proposal (including scrapping it) as implied by the evidence.
  5. Try to publish something from this exercise -- maybe it's the proposal, maybe it's "hey, it turns out lots of proposals in this domain hinge on this empirical question," maybe it's "here's why I now think [topic] is a dead end." This gathers more feedback and importantly circulates the information that you've thought about it a nonzero amount.

Curious what other approaches people recommend!

A technique I've found useful in making complex decisions where you gather lots of evidence over time -- for example, deciding what to do after your graduation, or whether to change jobs, etc., where you talk to lots of different people and weigh lots of considerations -- is to make a spreadsheet of all the arguments you hear, each with a score for how much it supports each decision.

For example, this summer, I was considering the options of "take the Open Phil job," "go to law school," and "finish the master's." I put each of these options in columns. Then, I'd hear an argument like "being in school delays your ability to take a full-time job, which is where most of your impact will happen"; I'd add a row for this argument. I thought this was a very strong consideration, so I gave the Open Phil job 10 points, law school 0, and the master's 3 (since it was one more year of school instead of 3 years). Later, I'd hear an argument like "legal knowledge is actually pretty useful for policy work," which I thought was a medium-strength consideration, and I'd give these options 0, 5, and 0.

I wouldn't take the sum of these as a final answer, but it was useful for a few reasons:

  • In complicated decisions, it's hard to hold all of the arguments in your head at a time. This might be part of why I noticed a strong recency bias, where the most recent handful of considerations raised to me seemed the most important. By putting them all in one place, I could feel like I was properly accounting for all the things I was aware of.
  • Relatedly, it helped me avoid double-counting arguments. When I'd talk to a new person, and they'd give me an opinion, I could just check whether their argument was basically already in the spreadsheet; sometimes I'd bump a number from 4 to 5, or something, based on them being persuasive, but sometimes I'd just say, "Oh, right, I guess I already knew this and shouldn't really update from it."
  • I also notice a temptation to simplify the decision down to a single crux or knockdown argument, but usually cluster thinking is a better way to make these decisions, and the spreadsheet helps aggregate things such that an overall balance of evidence can carry the day.
Load more