Taking a leave of absence from Open Philanthropy to work on AI safety

Holden Karnofsky

I’m planning a leave of absence (aiming for around 3 months and potentially more) from Open Philanthropy, starting on March 8, to explore working directly on AI safety.

I have a few different interventions I might explore. The first I explore will be AI safety standards: documented expectations (enforced via self-regulation at first, and potentially government regulation later) that AI labs won’t build and deploy systems that pose too much risk to the world, as evaluated by a systematic evaluation regime. (More here.) There’s significant interest from some AI labs in self-regulating via safety standards, and I want to see whether I can help with the work ARC and others are doing to hammer out standards that are both protective and practical - to the point where major AI labs are likely to sign on.

During my leave, Alexander Berger will serve as sole CEO of Open Philanthropy (as he did during my parental leave in 2021).

Depending on how things play out, I may end up working directly on AI safety full-time. Open Philanthropy will remain my employer for at least the start of my leave, but I’ll join or start another organization if I go full-time.

The reasons I’m doing this:

First, I’m very concerned about the possibility that transformative AI could be developed soon (possibly even within the decade - I don’t think this is >50% likely, but it seems too likely for my comfort). I want to be as helpful as possible, and I think the way to do this might be via working on AI safety directly rather than grantmaking.

Second, as a general matter, I’ve always aspired to help build multiple organizations rather than running one indefinitely. I think the former is a better fit for my talents and interests.

At both organizations I’ve co-founded (GiveWell and Open Philanthropy), I’ve had a goal from day one of helping to build an organization that can be great without me - and then moving on to build something else.
I think this went well with GiveWell thanks to Elie Hassenfeld’s leadership. I hope Open Philanthropy can go well under Alexander’s leadership.
Trying to get to that point has been a long-term project. Alexander, Cari, Dustin and I have been actively discussing the path to Open Philanthropy running without me since 2018.¹ Our mid-2021 promotion of Alexander to co-CEO was a major step in this direction (putting him in charge of more than half of the organization’s employees and giving), and this is another step, which we’ve been discussing and preparing for for over a year (and announced internally at Open Philanthropy on January 20).

I’ve become increasingly excited about various interventions to reduce AI risk, such as working on safety standards. I’m looking forward to experimenting with focusing my energy on AI safety.

Footnotes

This was only a year after Open Philanthropy became a separate organization, but it was several years after Open Philanthropy started as part of GiveWell under the title “GiveWell Labs.” ↩

420 Reactions

Mentioned in

229Joining the Carnegie Endowment for International Peace

139Open Philanthropy: Our Progress in 2023 and Plans for 2024

102New roles on my team: come build Open Phil's technical AI safety program with me!

99Seeking (Paid) Case Studies on Standards

90AI Safety - 7 months of discussion in 17 minutes

Load more (5/13)

More posts like this

Comments31

Sorted by

New & upvoted

Click to highlight new comments since: Today at 3:17 PM

[anonymous]Feb 23 2023241

As AI heats up, I'm excited and frankly somewhat relieved to have Holden making this change. While I agree with 𝕮𝖎𝖓𝖊𝖗𝖆's comment below that Holden had a lot of leverage on AI safety in his recent role, I also believe he has an vast amount of domain knowledge that can be applied more directly to problem solving. We're in shockingly short supply of that kind of person, and the need is urgent.

Alexander has my full confidence in his new role as the sole CEO. I consider us incredibly fortunate to have someone like him already involved and and prepared to of succeed as the leader of Open Philanthropy.

JKMFeb 25 202321

My understanding is that Alexander has different views from Holden in that he prioritises global health and wellbeing over longtermist cause areas. Is there a possibility that Open Phil's longtermist giving decreases due to having a "non-longtermist" at the helm?

[anonymous]Feb 25 202380

I believe that’s an oversimplification of what Alexander thinks but don’t want to put words in his mouth.

In any case this is one of the few decisions the 4 of us (including Cari) have always made together so we have done a lot of aligning already. My current view, which is mostly shared, is we’re currently underfunding x-risk even without longtermism math, both because FTXF went away and because I’ve updated towards shorter AI timelines in the past ~5 years. And even aside from that, we weren’t at full theoretical budget last year anyway. So that all nets out that to expected increase, not decrease.

I’d love to discover new large x-risk funders though and think recent history makes that more likely.

JKMFeb 25 20232

OK, thanks for sharing!

And yes I may well be oversimplifying Alexander's view.

OferFeb 24 2023105

In your recent Cold Takes post you disclosed that your wife owns equity in both OpenAI and Anthropic. (She was appointed to a VP position at OpenAI, as was her sibling, after you joined OpenAI's board of directors^[1]). In 2017, under your leadership, OpenPhil decided to generally stop publishing "relationship disclosures". How do you intend to handle conflicts of interest, and transparency about them, going forward?

You wrote here that the first intervention that you'll explore is AI safety standards that will be "enforced via self-regulation at first, and potentially government regulation later". AI companies can easily end up with "self-regulation" that is mostly optimized to appear helpful, in order to avoid regulation by governments. Conflicts of interest can easily influence decisions w.r.t. regulating AI companies (mostly via biases and self-deception, rather than via conscious reasoning).

EDIT: you joined OpenAI's board of directors as part of a deal between OpenPhil and OpenAI that involved recommending a $30M grant to OpenAI. ↩︎

CuriousEAFeb 24 202322

Can Holden clarify if and if so what proportion of those shares in OpenAI and Anthropic are legally pledged for donation?

Holden KarnofskyMar 22 202316

For context, my wife is the President and co-founder of Anthropic, and formerly worked at OpenAI.

80% of her equity in Anthropic is (not legally bindingly) pledged for donation. None of her equity in OpenAI is. She may pledge more in the future if there is a tangible compelling reason to do so.

I plan to be highly transparent about my conflict of interest, e.g. I regularly open meetings by disclosing it if I’m not sure the other person already knows about it, and I’ve often mentioned it when discussing related topics on Cold Takes.

I also plan to discuss the implications of my conflict of interest for any formal role I might take. It’s possible that my role in helping with safety standards will be limited to advising with no formal powers (it’s even possible that I’ll decide I simply can’t work in this area due to the conflict of interest, and will pursue one of the other interventions I’ve thought about).

But right now I’m just exploring options and giving non-authoritative advice, and that seems appropriate. (I’ll also note that I expect a lot of advice and opinions on standards to come from people who are directly employed by AI companies; while this does present a conflict of interest, and a more direct one than mine, I think it doesn’t and can’t mean they are excluded from relevant conversations.)

CuriousEAMar 29 20231

Thanks for the clarification.

𝕮𝖎𝖓𝖊𝖗𝖆Feb 23 202388

I notice that I am surprised and confused.

I'd have expected Holden to contribute much more to AI existential safety as CEO of Open Philanthropy (career capital, comparative advantage, specialisation, etc.) than via direct work.

I don't really know what to make of this.

That said, it sounds like you've given this a lot of deliberation and have a clear plan/course of action.

I'm excited about your endeavours in the project!

catherioFeb 23 202340

RE direct work, I would generally think of the described role as still a form of "leadership" — coordinating actors in the present — unlike "writing research papers" or "writing code". I expect Holden to have a strong comparative advantage at leadership-type work.

Michael_PJFeb 23 202317

Yes, it would be very different if he'd said "I'm going to skill up on ML and get coding"!

AjeyaFeb 24 202322

(I work at Open Phil, speaking for myself)

FWIW, I think this could also make a lot of sense. I don't think Holden would be an individual contributor writing code forever, but skilling up in ML and completing concrete research projects seems like a good foundation for ultimately building a team doing something in AI safety.

BuckFeb 24 202312

I don't think Holden agrees with this as much as you might think. For example, he spent a lot of his time in the last year or two writing a blog.

Aryeh EnglanderFeb 23 202342

I've been meaning to ask: Are there plans to turn your Cold Takes posts on AI safety and The Most Important Century into a published book? I think the posts would make for a very compelling book, and a book could reach a much broader audience and would likely get much more attention. (This has pros and cons of course, as you've discussed in your posts.)

WilliamKiely🔸Feb 24 202323

Amazon: The Most Important Century Paperback – February 12, 2022 by Holden Karnofsky

JoelMcGuireFeb 24 202314

Neat! Cover jacket could use a graphic designer in my opinion. It's also slotted under engineering? Am I missing something?

Holden KarnofskyMar 21 20234

I threw that book together for people who want to read it on Kindle, but it’s quite half-baked. If I had the time, I’d want to rework the series (and a more recent followup series at https://www.cold-takes.com/tag/implicationsofmostimportantcentury/) into a proper book, but I’m not sure when or whether I’ll do this.

DanielFilanMay 25 20231

For what it's worth, I don't see an option to buy a kindle version on Amazon - screenshot here

Holden KarnofskyJun 2 20232

I think this was a goof due to there being a separate hardcover version, which has now been removed - try again?

DanielFilanJun 5 20231

This link works.

GMcGowan Feb 23 202330

Is it at all fair to say you’re shifting your strategy from a “marathon” to a “sprint” strategy? I.e. prioritising work that you expect to help soon instead of later.

Is this move due to your personal timelines shortening?

Holden KarnofskyMar 21 20235

I wouldn’t say I’m in “sprinting” mode - I don’t expect my work hours to go up (and I generally work less than I did a few years ago, basically because I’m a dad now).

The move is partly about AI timelines, partly about the opportunities I see and partly about Open Philanthropy’s stage of development.

Michael_CohenFeb 24 202317

I'd love to chat with you about directions here, if you're interested. I don't know anyone with a bigger value of p(survival | West Wing levels of competence in major governments) - p(survival | leave it to OpenAI and DeepMind leadership). I've published technical AI existential safety research at top ML conferences/journals, and I've gotten two MPs in the UK onside this week. You can see my work at michael-k-cohen.com, and you can reach me at michael.cohen@eng.ox.ac.uk.

Nicholas WeiningerMar 3 20238

You may have already thought of this, but one place to start exploring what AI standards might look like is exploring what other safety standards for developing risky new things do in fact look like. The one I'm most familiar with (but not at all an expert on) is DO-178C Level A, the standard for developing avionics software where a bug could crash the plane. "Softer" examples worth looking at would include the SOC2 security certification standards.

I wrote a related thing here as a public comment to the NIST regulation framework developers, who I presume are high on your list to talk to as well: https://futuremoreperfect.substack.com/p/ai-regulation-wonkery

Steven ByrnesMar 20 20232

I’m in no position to judge how you should spend your time all things considered, but for what it’s worth, I think your blog posts on AI safety have been very clear and thoughtful, and I frequently recommend them to people (example). For example, I’ve started using the phrase “The King Lear Problem” from time to time (example).

Anyway, good luck! And let me know if there’s anything I can do to help you. 🙂

David JohnstonFeb 24 20232

I think your first priority is promising and seemingly neglected (though I'm not familiar with a lot of work done by governance folk, so I could be wrong here). I also get the impression that MIRI folk believe they have an unusually clear understanding of risks, would like to see risky development slow down and are pessimistic about their near-term prospects for solving technical problems of aligning very capable intelligent systems and generally don't see any clearly good next steps. It appears to me that this combination of skills and views positions them relatively well for developing AI safety standards. I'd be shocked if you didn't end up talking to MIRI about this issue, but I just wanted to point out that from my point of view there seems to be a substantial amount of fit here.

DavidmanheimMar 1 20232

are pessimistic about their near-term prospects for solving technical problems of aligning very capable intelligent systems and generally don't see any clearly good next steps

I don't think they claim to have better longer-term prospects, though.

David JohnstonMar 1 20231

I think they do? Nate at least says he’s optimistic about finding a solution given more time

Guy RavehFeb 24 20230

MIRI folk believe they have an unusually clear understanding of risks

"Believe" being the operative word here. I really don't think they do.

David JohnstonFeb 25 20233

I'm not sold on how well calibrated their predictions of catastrophe are, but I think they have contributed a large number of novel & important ideas to the field.

DavidmanheimMar 1 20232

I don't think they would claim to have significantly better predictive models in a positive sense, they just have far stronger models of what isn't possible and cannot work for ASI, and it constrains their expectations about the long term far more. (I'm not sure I agree with, say, Eliezer about his view of uselessness of governance, for example - but he has a very clear model, which is unusual.) I also don't think their view about timelines or takeoff speeds is really a crux - they have claimed that even if ASI is decades away, we still can't rely on current approaches to scale.

[comment deleted]Mar 26 20232

Deleted by Evan_Gaensbauer, 03/26/2023