Hide table of contents

I’m planning a leave of absence (aiming for around 3 months and potentially more) from Open Philanthropy, starting on March 8, to explore working directly on AI safety.

I have a few different interventions I might explore. The first I explore will be AI safety standards: documented expectations (enforced via self-regulation at first, and potentially government regulation later) that AI labs won’t build and deploy systems that pose too much risk to the world, as evaluated by a systematic evaluation regime. (More here.) There’s significant interest from some AI labs in self-regulating via safety standards, and I want to see whether I can help with the work ARC and others are doing to hammer out standards that are both protective and practical - to the point where major AI labs are likely to sign on.

During my leave, Alexander Berger will serve as sole CEO of Open Philanthropy (as he did during my parental leave in 2021).

Depending on how things play out, I may end up working directly on AI safety full-time. Open Philanthropy will remain my employer for at least the start of my leave, but I’ll join or start another organization if I go full-time.

The reasons I’m doing this:

First, I’m very concerned about the possibility that transformative AI could be developed soon (possibly even within the decade - I don’t think this is >50% likely, but it seems too likely for my comfort). I want to be as helpful as possible, and I think the way to do this might be via working on AI safety directly rather than grantmaking.

Second, as a general matter, I’ve always aspired to help build multiple organizations rather than running one indefinitely. I think the former is a better fit for my talents and interests.

  • At both organizations I’ve co-founded (GiveWell and Open Philanthropy), I’ve had a goal from day one of helping to build an organization that can be great without me - and then moving on to build something else.
  • I think this went well with GiveWell thanks to Elie Hassenfeld’s leadership. I hope Open Philanthropy can go well under Alexander’s leadership.
  • Trying to get to that point has been a long-term project. Alexander, Cari, Dustin and I have been actively discussing the path to Open Philanthropy running without me since 2018.1 Our mid-2021 promotion of Alexander to co-CEO was a major step in this direction (putting him in charge of more than half of the organization’s employees and giving), and this is another step, which we’ve been discussing and preparing for for over a year (and announced internally at Open Philanthropy on January 20).

I’ve become increasingly excited about various interventions to reduce AI risk, such as working on safety standards. I’m looking forward to experimenting with focusing my energy on AI safety.

Footnotes

    1. This was only a year after Open Philanthropy became a separate organization, but it was several years after Open Philanthropy started as part of GiveWell under the title “GiveWell Labs.” 

Comments32


Sorted by Click to highlight new comments since:
[anonymous]241
64
3

As AI heats up, I'm excited and frankly somewhat relieved to have Holden making this change. While I agree with 𝕮𝖎𝖓𝖊𝖗𝖆's comment below that Holden had a lot of leverage on AI safety in his recent role, I also believe he has an vast amount of domain knowledge that can be applied more directly to problem solving. We're in shockingly short supply of that kind of person, and the need is urgent.

Alexander has my full confidence in his new role as the sole CEO. I consider us incredibly fortunate to have someone like him already involved and and prepared to of succeed as the leader of Open Philanthropy.

My understanding is that Alexander has different views from Holden in that he prioritises global health and wellbeing over longtermist cause areas. Is there a possibility that Open Phil's longtermist giving decreases due to having a "non-longtermist" at the helm?

[anonymous]80
14
1

I believe that’s an oversimplification of what Alexander thinks but don’t want to put words in his mouth.

In any case this is one of the few decisions the 4 of us (including Cari) have always made together so we have done a lot of aligning already. My current view, which is mostly shared, is we’re currently underfunding x-risk even without longtermism math, both because FTXF went away and because I’ve updated towards shorter AI timelines in the past ~5 years. And even aside from that, we weren’t at full theoretical budget last year anyway. So that all nets out that to expected increase, not decrease.

I’d love to discover new large x-risk funders though and think recent history makes that more likely.

OK, thanks for sharing!

And yes I may well be oversimplifying Alexander's view.

Ofer
105
33
6

In your recent Cold Takes post you disclosed that your wife owns equity in both OpenAI and Anthropic. (She was appointed to a VP position at OpenAI, as was her sibling, after you joined OpenAI's board of directors[1]). In 2017, under your leadership, OpenPhil decided to generally stop publishing "relationship disclosures". How do you intend to handle conflicts of interest, and transparency about them, going forward?

You wrote here that the first intervention that you'll explore is AI safety standards that will be "enforced via self-regulation at first, and potentially government regulation later". AI companies can easily end up with "self-regulation" that is mostly optimized to appear helpful, in order to avoid regulation by governments. Conflicts of interest can easily influence decisions w.r.t. regulating AI companies (mostly via biases and self-deception, rather than via conscious reasoning).


    1. EDIT: you joined OpenAI's board of directors as part of a deal between OpenPhil and OpenAI that involved recommending a $30M grant to OpenAI. ↩︎

Can Holden clarify if and if so what proportion of those shares in OpenAI and Anthropic are legally pledged for donation?

For context, my wife is the President and co-founder of Anthropic, and formerly worked at OpenAI.

80% of her equity in Anthropic is (not legally bindingly) pledged for donation. None of her equity in OpenAI is. She may pledge more in the future if there is a tangible compelling reason to do so.

I plan to be highly transparent about my conflict of interest, e.g. I regularly open meetings by disclosing it if I’m not sure the other person already knows about it, and I’ve often mentioned it when discussing related topics on Cold Takes.

I also plan to discuss the implications of my conflict of interest for any formal role I might take. It’s possible that my role in helping with safety standards will be limited to advising with no formal powers (it’s even possible that I’ll decide I simply can’t work in this area due to the conflict of interest, and will pursue one of the other interventions I’ve thought about).

But right now I’m just exploring options and giving non-authoritative advice, and that seems appropriate. (I’ll also note that I expect a lot of advice and opinions on standards to come from people who are directly employed by AI companies; while this does present a conflict of interest, and a more direct one than mine, I think it doesn’t and can’t mean they are excluded from relevant conversations.)

Thanks for the clarification.

I notice that I am surprised and confused.

I'd have expected Holden to contribute much more to AI existential safety as CEO of Open Philanthropy (career capital, comparative advantage, specialisation, etc.) than via direct work.

I don't really know what to make of this.

That said, it sounds like you've given this a lot of deliberation and have a clear plan/course of action.

I'm excited about your endeavours in the project!

RE direct work, I would generally think of the described role as still a form of "leadership" — coordinating actors in the present — unlike  "writing research papers" or "writing code". I expect Holden to have a strong comparative advantage at leadership-type work.

Yes, it would be very different if he'd said "I'm going to skill up on ML and get coding"!

(I work at Open Phil, speaking for myself)

FWIW, I think this could also make a lot of sense. I don't think Holden would be an individual contributor writing code forever, but skilling up in ML and completing concrete research projects seems like a good foundation for ultimately building a team doing something in AI safety.

Buck
12
6
1

I don't think Holden agrees with this as much as you might think. For example, he spent a lot of his time in the last year or two writing a blog.

I've been meaning to ask: Are there plans to turn your Cold Takes posts on AI safety and The Most Important Century into a published book? I think the posts would make for a very compelling book, and a book could reach a much broader audience and would likely get much more attention. (This has pros and cons of course, as you've discussed in your posts.)

Neat! Cover jacket could use a graphic designer in my opinion. It's also slotted under engineering? Am I missing something?

I threw that book together for people who want to read it on Kindle, but it’s quite half-baked. If I had the time, I’d want to rework the series (and a more recent followup series at https://www.cold-takes.com/tag/implicationsofmostimportantcentury/) into a proper book, but I’m not sure when or whether I’ll do this.

For what it's worth, I don't see an option to buy a kindle version on Amazon - screenshot here

I think this was a goof due to there being a separate hardcover version, which has now been removed - try again?

Is it at all fair to say you’re shifting your strategy from a “marathon” to a “sprint” strategy? I.e. prioritising work that you expect to help soon instead of later.

Is this move due to your personal timelines shortening?

I wouldn’t say I’m in “sprinting” mode - I don’t expect my work hours to go up (and I generally work less than I did a few years ago, basically because I’m a dad now).

The move is partly about AI timelines, partly about the opportunities I see and partly about Open Philanthropy’s stage of development.

I'd love to chat with you about directions here, if you're interested. I don't know anyone with a bigger value of  p(survival | West Wing levels of competence in major governments) - p(survival | leave it to OpenAI and DeepMind leadership). I've published technical AI existential safety research at top ML conferences/journals, and I've gotten two MPs in the UK onside this week. You can see my work at michael-k-cohen.com, and you can reach me at michael.cohen@eng.ox.ac.uk.

You may have already thought of this, but one place to start exploring what AI standards might look like is exploring what other safety standards for developing risky new things do in fact look like. The one I'm most familiar with (but not at all an expert on) is DO-178C Level A, the standard for developing avionics software where a bug could crash the plane. "Softer" examples worth looking at would include the SOC2 security certification standards.

I wrote a related thing here as a public comment to the NIST regulation framework developers, who I presume are high on your list to talk to as well: https://futuremoreperfect.substack.com/p/ai-regulation-wonkery

I’m in no position to judge how you should spend your time all things considered, but for what it’s worth, I think your blog posts on AI safety have been very clear and thoughtful, and I frequently recommend them to people (example). For example, I’ve started using the phrase “The King Lear Problem” from time to time (example).

Anyway, good luck! And let me know if there’s anything I can do to help you. 🙂

I think your first priority is promising and seemingly neglected (though I'm not familiar with a lot of work done by governance folk, so I could be wrong here). I also get the impression that MIRI folk believe they have an unusually clear understanding of risks, would like to see risky development slow down and are pessimistic about their near-term prospects for solving technical problems of aligning very capable intelligent systems and generally don't see any clearly good next steps. It appears to me that this combination of skills and views positions them relatively well for developing AI safety standards. I'd be shocked if you didn't end up talking to MIRI about this issue, but I just wanted to point out that from my point of view there seems to be a substantial amount of fit here.

are pessimistic about their near-term prospects for solving technical problems of aligning very capable intelligent systems and generally don't see any clearly good next steps

I don't think they claim to have better longer-term prospects, though.

I think they do? Nate at least says he’s optimistic about finding a solution given more time

MIRI folk believe they have an unusually clear understanding of risks

"Believe" being the operative word here. I really don't think they do.

I'm not sold on how well calibrated their predictions of catastrophe are, but I think they have contributed a large number of novel & important ideas to the field.

I don't think they would claim to have significantly better predictive models in a positive sense, they just have far stronger models of what isn't possible and cannot work for ASI, and it constrains their expectations about the long term far more. (I'm not sure I agree with, say, Eliezer about his view of uselessness of governance, for example - but he has a very clear model, which is unusual.) I also don't think their view about timelines or takeoff speeds is really a crux - they have claimed that even if ASI is decades away, we still can't rely on current approaches to scale.

[comment deleted]2
0
0
Curated and popular this week
Paul Present
 ·  · 28m read
 · 
Note: I am not a malaria expert. This is my best-faith attempt at answering a question that was bothering me, but this field is a large and complex field, and I’ve almost certainly misunderstood something somewhere along the way. Summary While the world made incredible progress in reducing malaria cases from 2000 to 2015, the past 10 years have seen malaria cases stop declining and start rising. I investigated potential reasons behind this increase through reading the existing literature and looking at publicly available data, and I identified three key factors explaining the rise: 1. Population Growth: Africa's population has increased by approximately 75% since 2000. This alone explains most of the increase in absolute case numbers, while cases per capita have remained relatively flat since 2015. 2. Stagnant Funding: After rapid growth starting in 2000, funding for malaria prevention plateaued around 2010. 3. Insecticide Resistance: Mosquitoes have become increasingly resistant to the insecticides used in bednets over the past 20 years. This has made older models of bednets less effective, although they still have some effect. Newer models of bednets developed in response to insecticide resistance are more effective but still not widely deployed.  I very crudely estimate that without any of these factors, there would be 55% fewer malaria cases in the world than what we see today. I think all three of these factors are roughly equally important in explaining the difference.  Alternative explanations like removal of PFAS, climate change, or invasive mosquito species don't appear to be major contributors.  Overall this investigation made me more convinced that bednets are an effective global health intervention.  Introduction In 2015, malaria rates were down, and EAs were celebrating. Giving What We Can posted this incredible gif showing the decrease in malaria cases across Africa since 2000: Giving What We Can said that > The reduction in malaria has be
Ronen Bar
 ·  · 10m read
 · 
"Part one of our challenge is to solve the technical alignment problem, and that’s what everybody focuses on, but part two is: to whose values do you align the system once you’re capable of doing that, and that may turn out to be an even harder problem", Sam Altman, OpenAI CEO (Link).  In this post, I argue that: 1. "To whose values do you align the system" is a critically neglected space I termed “Moral Alignment.” Only a few organizations work for non-humans in this field, with a total budget of 4-5 million USD (not accounting for academic work). The scale of this space couldn’t be any bigger - the intersection between the most revolutionary technology ever and all sentient beings. While tractability remains uncertain, there is some promising positive evidence (See “The Tractability Open Question” section). 2. Given the first point, our movement must attract more resources, talent, and funding to address it. The goal is to value align AI with caring about all sentient beings: humans, animals, and potential future digital minds. In other words, I argue we should invest much more in promoting a sentient-centric AI. The problem What is Moral Alignment? AI alignment focuses on ensuring AI systems act according to human intentions, emphasizing controllability and corrigibility (adaptability to changing human preferences). However, traditional alignment often ignores the ethical implications for all sentient beings. Moral Alignment, as part of the broader AI alignment and AI safety spaces, is a field focused on the values we aim to instill in AI. I argue that our goal should be to ensure AI is a positive force for all sentient beings. Currently, as far as I know, no overarching organization, terms, or community unifies Moral Alignment (MA) as a field with a clear umbrella identity. While specific groups focus individually on animals, humans, or digital minds, such as AI for Animals, which does excellent community-building work around AI and animal welfare while
Max Taylor
 ·  · 9m read
 · 
Many thanks to Constance Li, Rachel Mason, Ronen Bar, Sam Tucker-Davis, and Yip Fai Tse for providing valuable feedback. This post does not necessarily reflect the views of my employer. Artificial General Intelligence (basically, ‘AI that is as good as, or better than, humans at most intellectual tasks’) seems increasingly likely to be developed in the next 5-10 years. As others have written, this has major implications for EA priorities, including animal advocacy, but it’s hard to know how this should shape our strategy. This post sets out a few starting points and I’m really interested in hearing others’ ideas, even if they’re very uncertain and half-baked. Is AGI coming in the next 5-10 years? This is very well covered elsewhere but basically it looks increasingly likely, e.g.: * The Metaculus and Manifold forecasting platforms predict we’ll see AGI in 2030 and 2031, respectively. * The heads of Anthropic and OpenAI think we’ll see it by 2027 and 2035, respectively. * A 2024 survey of AI researchers put a 50% chance of AGI by 2047, but this is 13 years earlier than predicted in the 2023 version of the survey. * These predictions seem feasible given the explosive rate of change we’ve been seeing in computing power available to models, algorithmic efficiencies, and actual model performance (e.g., look at how far Large Language Models and AI image generators have come just in the last three years). * Based on this, organisations (both new ones, like Forethought, and existing ones, like 80,000 Hours) are taking the prospect of near-term AGI increasingly seriously. What could AGI mean for animals? AGI’s implications for animals depend heavily on who controls the AGI models. For example: * AGI might be controlled by a handful of AI companies and/or governments, either in alliance or in competition. * For example, maybe two government-owned companies separately develop AGI then restrict others from developing it. * These actors’ use of AGI might be dr
Recent opportunities in AI safety