Taking a leave of absence from Open Philanthropy to work on AI safety

Holden Karnofsky

Taking a leave of absence from Open Philanthropy to work on AI safety

Holden Karnofsky

2 min readFeb 23, 2023

420

Comments 31

Sorted by

New & upvoted

[anonymous]

241

As AI heats up, I'm excited and frankly somewhat relieved to have Holden making this change. While I agree with 𝕮𝖎𝖓𝖊𝖗𝖆's comment below that Holden had a lot of leverage on AI safety in his recent role, I also believe he has an vast amount of domain knowledge that can be applied more directly to problem solving. We're in shockingly short supply of that kind of person, and the need is urgent.

Alexander has my full confidence in his new role as the sole CEO. I consider us incredibly fortunate to have someone like him already involved and and prepared to of succeed as the leader of Open Philanthropy.

JKM

My understanding is that Alexander has different views from Holden in that he prioritises global health and wellbeing over longtermist cause areas. Is there a possibility that Open Phil's longtermist giving decreases due to having a "non-longtermist" at the helm?

[anonymous]

I believe that’s an oversimplification of what Alexander thinks but don’t want to put words in his mouth.

In any case this is one of the few decisions the 4 of us (including Cari) have always made together so we have done a lot of aligning already. My current view, which is mostly shared, is we’re currently underfunding x-risk even without longtermism math, both because FTXF went away and because I’ve updated towards shorter AI timelines in the past ~5 years. And even aside from that, we weren’t at full theoretical budget last year anyway. So that all nets out that to expected increase, not decrease.

I’d love to discover new large x-risk funders though and think recent history makes that more likely.

JKM

OK, thanks for sharing!

And yes I may well be oversimplifying Alexander's view.

Ofer

105

In your recent Cold Takes post you disclosed that your wife owns equity in both OpenAI and Anthropic. (She was appointed to a VP position at OpenAI, as was her sibling, after you joined OpenAI's board of directors^[1]). In 2017, under your leadership, OpenPhil decided to generally stop publishing "relationship disclosures". How do you intend to handle conflicts of interest, and transparency about them, going forward?

You wrote here that the first intervention that you'll explore is AI safety standards that will be "enforced via self-regulation at first, and potentially government regulation later". AI companies can easily end up with "self-regulation" that is mostly optimized to appear helpful, in order to avoid regulation by governments. Conflicts of interest can easily influence decisions w.r.t. regulating AI companies (mostly via biases and self-deception, rather than via conscious reasoning).

EDIT: you joined OpenAI's board of directors as part of a deal between OpenPhil and OpenAI that involved recommending a $30M grant to OpenAI. ↩︎

CuriousEA

Can Holden clarify if and if so what proportion of those shares in OpenAI and Anthropic are legally pledged for donation?

Holden Karnofsky

For context, my wife is the President and co-founder of Anthropic, and formerly worked at OpenAI.

80% of her equity in Anthropic is (not legally bindingly) pledged for donation. None of her equity in OpenAI is. She may pledge more in the future if there is a tangible compelling reason to do so.

I plan to be highly transparent about my conflict of interest, e.g. I regularly open meetings by disclosing it if I’m not sure the other person already knows about it, and I’ve often mentioned it when discussing related topics on Cold Takes.

I also plan to discuss the implications of my conflict of interest for any formal role I might take. It’s possible that my role in helping with safety standards will be limited to advising with no formal powers (it’s even possible that I’ll decide I simply can’t work in this area due to the conflict of interest, and will pursue one of the other interventions I’ve thought about).

But right now I’m just exploring options and giving non-authoritative advice, and that seems appropriate. (I’ll also note that I expect a lot of advice and opinions on standards to come from people who are directly employed by AI companies; while this does present a conflict of interest, and a more direct one than mine, I think it doesn’t and can’t mean they are excluded from relevant conversations.)

CuriousEA

Thanks for the clarification.

𝕮𝖎𝖓𝖊𝖗𝖆

I notice that I am surprised and confused.

I'd have expected Holden to contribute much more to AI existential safety as CEO of Open Philanthropy (career capital, comparative advantage, specialisation, etc.) than via direct work.

I don't really know what to make of this.

That said, it sounds like you've given this a lot of deliberation and have a clear plan/course of action.

I'm excited about your endeavours in the project!

catherio

RE direct work, I would generally think of the described role as still a form of "leadership" — coordinating actors in the present — unlike "writing research papers" or "writing code". I expect Holden to have a strong comparative advantage at leadership-type work.

Michael_PJ

Yes, it would be very different if he'd said "I'm going to skill up on ML and get coding"!

Ajeya

(I work at Open Phil, speaking for myself)

FWIW, I think this could also make a lot of sense. I don't think Holden would be an individual contributor writing code forever, but skilling up in ML and completing concrete research projects seems like a good foundation for ultimately building a team doing something in AI safety.

Buck

I don't think Holden agrees with this as much as you might think. For example, he spent a lot of his time in the last year or two writing a blog.

Aryeh Englander

I've been meaning to ask: Are there plans to turn your Cold Takes posts on AI safety and The Most Important Century into a published book? I think the posts would make for a very compelling book, and a book could reach a much broader audience and would likely get much more attention. (This has pros and cons of course, as you've discussed in your posts.)

WilliamKiely🔸

Amazon: The Most Important Century Paperback – February 12, 2022 by Holden Karnofsky

JoelMcGuire

Neat! Cover jacket could use a graphic designer in my opinion. It's also slotted under engineering? Am I missing something?

Holden Karnofsky

I threw that book together for people who want to read it on Kindle, but it’s quite half-baked. If I had the time, I’d want to rework the series (and a more recent followup series at https://www.cold-takes.com/tag/implicationsofmostimportantcentury/) into a proper book, but I’m not sure when or whether I’ll do this.

DanielFilan

For what it's worth, I don't see an option to buy a kindle version on Amazon - screenshot here

Holden Karnofsky

I think this was a goof due to there being a separate hardcover version, which has now been removed - try again?

DanielFilan

This link works.

GMcGowan

Is it at all fair to say you’re shifting your strategy from a “marathon” to a “sprint” strategy? I.e. prioritising work that you expect to help soon instead of later.

Is this move due to your personal timelines shortening?

Holden Karnofsky

I wouldn’t say I’m in “sprinting” mode - I don’t expect my work hours to go up (and I generally work less than I did a few years ago, basically because I’m a dad now).

The move is partly about AI timelines, partly about the opportunities I see and partly about Open Philanthropy’s stage of development.

Michael_Cohen

I'd love to chat with you about directions here, if you're interested. I don't know anyone with a bigger value of p(survival | West Wing levels of competence in major governments) - p(survival | leave it to OpenAI and DeepMind leadership). I've published technical AI existential safety research at top ML conferences/journals, and I've gotten two MPs in the UK onside this week. You can see my work at michael-k-cohen.com, and you can reach me at [email protected].

Nicholas Weininger

You may have already thought of this, but one place to start exploring what AI standards might look like is exploring what other safety standards for developing risky new things do in fact look like. The one I'm most familiar with (but not at all an expert on) is DO-178C Level A, the standard for developing avionics software where a bug could crash the plane. "Softer" examples worth looking at would include the SOC2 security certification standards.

I wrote a related thing here as a public comment to the NIST regulation framework developers, who I presume are high on your list to talk to as well: https://futuremoreperfect.substack.com/p/ai-regulation-wonkery

Steven Byrnes

I’m in no position to judge how you should spend your time all things considered, but for what it’s worth, I think your blog posts on AI safety have been very clear and thoughtful, and I frequently recommend them to people (example). For example, I’ve started using the phrase “The King Lear Problem” from time to time (example).

Anyway, good luck! And let me know if there’s anything I can do to help you. 🙂

David Johnston

I think your first priority is promising and seemingly neglected (though I'm not familiar with a lot of work done by governance folk, so I could be wrong here). I also get the impression that MIRI folk believe they have an unusually clear understanding of risks, would like to see risky development slow down and are pessimistic about their near-term prospects for solving technical problems of aligning very capable intelligent systems and generally don't see any clearly good next steps. It appears to me that this combination of skills and views positions them relatively well for developing AI safety standards. I'd be shocked if you didn't end up talking to MIRI about this issue, but I just wanted to point out that from my point of view there seems to be a substantial amount of fit here.

Davidmanheim

are pessimistic about their near-term prospects for solving technical problems of aligning very capable intelligent systems and generally don't see any clearly good next steps

I don't think they claim to have better longer-term prospects, though.

David Johnston

I think they do? Nate at least says he’s optimistic about finding a solution given more time

Guy Raveh

MIRI folk believe they have an unusually clear understanding of risks

"Believe" being the operative word here. I really don't think they do.

David Johnston

I'm not sold on how well calibrated their predictions of catastrophe are, but I think they have contributed a large number of novel & important ideas to the field.

Davidmanheim

I don't think they would claim to have significantly better predictive models in a positive sense, they just have far stronger models of what isn't possible and cannot work for ASI, and it constrains their expectations about the long term far more. (I'm not sure I agree with, say, Eliezer about his view of uselessness of governance, for example - but he has a very clear model, which is unusual.) I also don't think their view about timelines or takeoff speeds is really a crux - they have claimed that even if ASI is decades away, we still can't rely on current approaches to scale.

Comments

Taking a leave of absence from Open Philanthropy to work on AI safety

Taking a leave of absence from Open Philanthropy to work on AI safety

Footnotes