All of Holden Karnofsky&#x27;s Comments + Replies

My takes on the FTX situation will (mostly) be cold, not hot

I think this was a goof due to there being a separate hardcover version, which has now been removed - try again?

DanielFilan

This link works.

Time Article Discussion - "Effective Altruist Leaders Were Repeatedly Warned About Sam Bankman-Fried Years Before FTX Collapsed"

To give a rough idea, I basically mean anyone who is likely to harm those around them (using a common-sense idea of doing harm) and/or "pollute the commons" by having an outsized and non-consultative negative impact on community dynamics. It's debatable what the best warning signs are and how reliable they are.

Holden Karnofsky3y83

Re: "In the weeks leading up to that April 2018 confrontation with Bankman-Fried and in the months that followed, Mac Aulay and others warned MacAskill, Beckstead and Karnofsky about her co-founder’s alleged duplicity and unscrupulous business ethics" -

I don't remember Tara reaching out about this, and I just searched my email for signs of this and didn’t see any. I'm not confident this didn't happen, just noting that I can't remember or easily find signs of it.

In terms of what I knew/learned 2018 more generally, I discuss that here.

Taking a leave of absence from Open Philanthropy to work on AI safety

Holden Karnofsky3y16

For context, my wife is the President and co-founder of Anthropic, and formerly worked at OpenAI.

80% of her equity in Anthropic is (not legally bindingly) pledged for donation. None of her equity in OpenAI is. She may pledge more in the future if there is a tangible compelling reason to do so.

I plan to be highly transparent about my conflict of interest, e.g. I regularly open meetings by disclosing it if I’m not sure the other person already knows about it, and I’ve often mentioned it when discussing related topics on Cold Takes.

I also plan to discuss the ... (read more)

CuriousEA

Thanks for the clarification.

Some comments on recent FTX-related events

Holden Karnofsky3y32

There was no one with official responsibility for the relationship between FTX and the EA community. I think the main reason the two were associated was via FTX’s/Sam having a high profile and talking a lot about EA - that’s not something anyone else was able to control. (Some folks did ask him to do less of this.)

It’s also worth noting that we generally try to be cautious about power dynamics as a funder, which means we are hesitant to be pushy about most matters. In particular, I think one of two major funders in this space attacking the other, nudging g... (read more)

Some comments on recent FTX-related events

Holden Karnofsky3y28

In 2018, I heard accusations that Sam had communicated in ways that left people confused or misled, though often with some ambiguity about whether Sam had been confused himself, had been inadvertently misleading while factually accurate, etc. I put some effort into understanding these concerns (but didn’t spend a ton of time on it; Open Phil didn’t have a relationship with Sam or Alameda).

I didn’t hear anything that sounded anywhere near as bad as what has since come out about his behavior at FTX. At the time I didn’t feel my concerns rose to the level whe... (read more)

My takes on the FTX situation will (mostly) be cold, not hot

Holden Karnofsky3y3

I don’t believe #1 is correct. The Open Philanthropy grant is a small fraction of the funding OpenAI has received, and I don’t think it was crucial for OpenAI at any point.

I think #2 is fair insofar as running a scaling lab poses big risks to the world. I hope that OpenAI will avoid training or deploying directly dangerous systems; I think that even the deployments it’s done so far pose risks via hype and acceleration. (Considering the latter a risk to society is an unusual standard to hold a company to, but I think it’s appropriate here.)

#3 seems off to m... (read more)

Wei Dai2y12

#5 seems off to me. I don’t know whether OpenAI uses nondisparagement agreements;

Details about OpenAI's nondisparagement agreements have come out.

Spreading messages to help with the most important century

Spreading messages to help with the most important century

Just noting that many of the “this concept is properly explained elsewhere” links are also accompanied by expandable boxes that you can click to expand for the gist. I do think that understanding where I’m coming from in this piece requires a bunch of background, but I’ve tried to make it as easy on readers as I could, e.g. explaining each concept in brief and providing a link if the brief explanation isn’t clear enough or doesn’t address particular objections.

Taking a leave of absence from Open Philanthropy to work on AI safety

Noting that I’m now going back through posts responding to comments, after putting off doing so for months - I generally find it easier to do this in bulk to avoid being distracted from my core priorities, though this time I think I put it off longer than I should’ve.

It is generally true that my participation in comments is extremely sporadic/sparse, and folks should factor that into curation decisions.

Holden Karnofsky3y5

I wouldn’t say I’m in “sprinting” mode - I don’t expect my work hours to go up (and I generally work less than I did a few years ago, basically because I’m a dad now).

The move is partly about AI timelines, partly about the opportunities I see and partly about Open Philanthropy’s stage of development.

Taking a leave of absence from Open Philanthropy to work on AI safety

We're no longer "pausing most new longtermist funding commitments"

I threw that book together for people who want to read it on Kindle, but it’s quite half-baked. If I had the time, I’d want to rework the series (and a more recent followup series at https://www.cold-takes.com/tag/implicationsofmostimportantcentury/) into a proper book, but I’m not sure when or whether I’ll do this.

DanielFilan

For what it's worth, I don't see an option to buy a kindle version on Amazon - screenshot here

We're no longer "pausing most new longtermist funding commitments"

I expect more funding discontinuations than usual, but we generally try to discontinue funding in a way that gives organizations time to plan around the change.

I’m not leading the longer-term process. I expect Open Philanthropy will publish content about it, but I’m not sure when.

We're no longer "pausing most new longtermist funding commitments"

I don’t have a good answer, sorry. The difficulty of getting cardinal estimates for longtermist grants is a lot of what drove our decision to go with an ordinal approach instead.

Holden Karnofsky3y8

Aiming to spend down in less than 20 years would not obviously be justified even if one’s median for transformative AI timelines were well under 20 years. This is because we may want extra capital in a “crunch time” where we’re close enough to transformative AI for the strategic picture to have become a lot clearer, and because even a 10-25% chance of longer timelines would provide some justification for not spending down on short time frames.

This move could be justified if the existing giving opportunities were strong enough even with a lower bar. That may end up being the case in the future. But we don’t feel it’s the case today, having eyeballed the stack rank.

Tristan Cook

I agree. This lines with models of optimal spending I worked on which allowed for a post-fire alarm "crunch time" in which one can spend a significant fraction of remaining capital.

My takes on the FTX situation will (mostly) be cold, not hot

Holden Karnofsky3y106

Here’s a followup with some reflections.

Note that I discuss some takeaways and potential lessons learned in this interview.

Here are some (somewhat redundant with the interview) things I feel like I’ve updated on in light of the FTX collapse and aftermath:

The most obvious thing that’s changed is a tighter funding situation, which I addressed here.
I’m generally more concerned about the dynamics I wrote about in EA is about maximization, and maximization is perilous. If I wrote that piece today, most of it would be the same, but the “Avoiding the p

... (read more)

NewLeaf

Thanks so much for these reflections. Would you consider saying more about which other actions seem most promising to you, beyond articulating a robust case against "hard-core utilitarianism" and improving the community's ability to identify and warn about bad actors? For the reasons I gave here, I think it would be valuable for leaders in the EA community to be talking much more concretely about opportunities to reduce the risk that future efforts inspired by EA ideas might cause unintended harm.

James Herbert3y14

Thanks for the update! I agree with Nathan that this deserves its own post.

Re your last point, I always saw SBF/FTX (when things were going well) as a success story relating to E2G/billionaire philanthropy/maximisation/hardcore utilitarianism/risk-taking/etc. I feel these are the factors that make SBF's case distinctive, and the connection to community building is more tenuous.

This being the case, the whole thing has updated me away from those things, but it hasn't really updated my view on community building (other than that we should be doing... (read more)

Severin3y30

Thanks for writing this up. I agree with most of these points. However, not with the last one:

I think we should see “EA community building” as less valuable than before, if only because one of the biggest seeming success stories now seems to be a harm story. I think this concern applies to community building for specific issues as well.

If anything, I think the dangers and pitfalls of optimization you mention warrant different community building, not less. Specifically, I see two potential dangers to pulling resources out of community building:

Funded commun

... (read more)

peterhartree

Thoughts on “maximisation is perilous”: (1) We could put more emphasis on the idea of “two-thirds utilitarianism”. (2) I expect we could come up with a better name for two-thirds utilitarianism and a snappier way of describing the key thought. Deep pragmatism might work.

Jeffrey Kursonis

I laughed as I agreed about the "punchable" comment. Certainly, as a non STEM individual much of EA seems punchable to me, SBF's face in particular should inspire a line of punching bags embroidered with it. But for this to lead you to downgrade EA community building seems like wildly missing the point, which is to be less punchable, ie. more "normal", "likable", "relatable to average people". I say this from huge experience in movement building...the momentum and energy a movement like EA creates is tremendous and may even lead to saving the world, and it is simply a movement that has reached a maturation way point that uncovers common normal problems like when you show up to your first real job and discover your college kid cultural mindset needs an update. The problem is not EA community building, it is getting seduced by billionaire/elite/elon culture and getting sucked into it like Clinton hanging out with Epstein...Oops. Don't reduce growth energy to a rare energetic movement, just fix whatever sucked people in towards the big money. Said with much love and respect for all you and the early EA pioneers have done. I've seen movements falter, trip and fall...don't do that. Learn and adjust, but do not pull back. EA community building is literally the living body, you can't stop feeding it.

Arden Koehler

Hey Holden, Thanks for these reflections! Could you maybe elaborate on what you mean by a 'bad actor'? There's some part of me that feels nervous about this as a framing, at least without further specification -- like maybe the concept could be either applied too widely (e.g. to anyone who expresses sympathy with "hard-core utilitarianism", which I'd think wouldn't be right), or have a really strict definition (like only people with dark tetrad traits) in a way that leaves out people who might be likely to (or: have the capacity to?) take really harmful actions.

Nathan Young

Thanks for writing this. It feels off to me that this is a forum reply. Seems like it is important enough that it should be a post and then showed to people in accordance with that.

High-level hopes for AI alignment

Holden Karnofsky3y3

My point with the observation you quoted wasn't "This would be unprecedented, therefore there's a very low prior probability." It was more like: "It's very hard to justify >90% confidence on anything without some strong base rate to go off of. In this case, we have no base rate to go off of; we're pretty wildly guessing." I agree something weird has to happen fairly "soon" by zoomed-out historical standards, but there are many possible candidates for what the weird thing is (I also endorse dsj's comment below).

What AI companies can do today to help with the most important century

Jobs that can help with the most important century

If I saw a path to slowing down or stopping AI development, reliably and worldwide, I think it’d be worth considering.

But I don’t think advising particular AI companies to essentially shut down (or radically change their mission) is a promising step toward that goal.

And I think partial progress toward that goal is worse than none, if it slows down relatively caution-oriented players without slowing down others.

Spreading messages to help with the most important century

Not easily - I skimmed it before linking to it and thought "Eh, I would maybe reframe some of these if I were writing the post today," but found it easier to simply note that point than to do a rewrite or even a list of specific changes, given that I don't think the picture has radically changed.

High-level hopes for AI alignment

Agreed!

Habryka [Deactivated]3y11

Civilizational collapse would be a historically unprecedented event, and the future is very hard to predict; on those grounds alone, putting the odds of civilizational collapse above 90% seems like it requires a large burden of proof/argumentation. I don't think "We can't name a specific, likely-seeming path to success now" is enough to get there - I think there are many past risks to civilization that people worried about in advance, didn't see clear paths to dealing with, and yet didn't end up being catastrophic. Furthermore, I do envision some possible... (read more)

RobertM

Separate from object-level disagreements, my crux that people can have inside-view models which "rule out" other people's models (as well as outside-view considerations) in a way that leads to assigning very high likelihoods (i.e. 99%+) to certain outcomes. The fact that they haven't successfully communicated their models to you is certainly a reason for you to not update strongly in their direction, but it doesn't mean much for their internal epistemic stance.

Civilizational collapse would be a historically unprecedented event, and the future is very hard to predict;

I don't find this reasoning very compelling, mostly on the basis of "this can't go on"-type logic. Like, we basically know that the next century will be "historically unprecedented". Indeed, it would be really surprising if the next century would not be unprecedented, since humanity has never remotely been in a similar starting position.

We can't sustain current growth levels, stagnation at any specific point would be quite weird, and sudden col... (read more)

How might we align transformative AI if it’s developed very soon?

Nonprofit Boards are Weird

Very belatedly fixed - thanks!

Holden Karnofsky3y5

This sounds like it could be good for some organizations (e.g., membership organizations), though it's less clear how to make it work (who gets a vote?) for many other types of organizations.

Nonprofit Boards are Weird

AI Could Defeat All Of Us Combined

I broadly agree with these recommendations. I think they are partial but not full mitigations to the "weird" properties I mention, and often raise challenges of their own (though I think they're often worth it on balance).

I haven't seen much in the way of nonprofit boards with limited powers / outside-the-board accountability. (I haven't mostly dealt with membership organizations.) It definitely sounds interesting, but I don't have solid examples of how it's done in practice and what other issues are raised by that.

AI Could Defeat All Of Us Combined

These questions are outside the scope of this post, which is about what would happen if AIs were pointed at defeating humanity.

I don't think there's a clear answer to whether AIs would have a lot of their goals in common, or find it easier to coordinate with each other than with humans, but the probability of each seems at least reasonably high if they are all developed using highly similar processes (making them all likely more similar to each other in many ways than to humans).

The Wicked Problem Experience

Sorry for chiming in so late! The basic idea here is that if you have 2x the resources it would take to train a transformative model, then you have enough to run a huge number of them.

It's true that the first transformative model might eat all the resources its developer has at the time. But it seems likely that (a) given that they've raised $X to train it as a reasonably speculative project, once it turns out to be transformative there will probably be at least a further $X available to pay for running copies; (b) not too long after, as compute continues to get more efficient, someone will have the 2x the resources needed to train the model.

Why did CEA buy Wytham Abbey?

Apologies for chiming in so late!

I believe GWWC's recommendation of Against Malaria Foundation was based on GiveWell's (otherwise they might've recommended another bednet charity). And Peter Singer generally did not recommend the charities that GiveWell ranks highly, before GiveWell ranked them highly.

I don't want to deny, though, that for any given research project you might undertake, there's often a much quicker approach that gets you part of the way there. I think the process you described is a fine way to generate some good initial leads (I think GWWC... (read more)

Doing EA Better

Holden Karnofsky3y35

Thanks for the time you’ve put into trying to improve EA, and it’s unfortunate that you feel the need to do so anonymously!

Below are some reactions, focused on points that you highlighted to me over email as sections you’d particularly appreciate my thoughts on.

On anonymity - as a funder, we need to make judgments about potential grantees, but want to do so in a way that doesn't create perverse incentives. This section of an old Forum post summarizes how I try to reconcile these goals, and how I encourage others to. When evaluating potential grantees... (read more)

-5

A.C.Skraeling

1[anonymous]3y

What's best for spending Cari and Dustin's financial capital may not be what's best for the human community made up of EAs. One could even argue that the human capital in the EA community is roughly on par with or even exceeds the value of Good Ventures' capital. Just something to think about.

Noah Scales

Do you have specific concerns about how the capital is spent? That is, are you dissatisfied and looking to address concerns that you have or to solve problems that you have identified? I'm wondering about any overlap between your concerns and the OP's. I'd be glad for an answer or just a link to something written, if you have time.

Holden Karnofsky3y32

My take is about 90% in agreement with this.

The other 10% is something like: "But sometimes adding time and care to how, when, and whether you say something can be a big deal. It could have real effects on the first impressions you, and the ideas and communities and memes you care about, make on people who (a) could have a lot to contribute on goals you care about; (b) are the sort of folks for whom first impressions matter."

10% is maybe an average. I think it should be lower (5%?) for an early-career person who's prioritizing exploration, experiment... (read more)

Comments for shorter Cold Takes pieces

Hm. I contacted Nick and replaced it with another link - does that work?

Jeremy

Yup, works for me now.

Comments for shorter Cold Takes pieces

I didn't make a claim that constant replacement occurs "empirically." As far as I can tell, it's not possible to empirically test whether it does or not. I think we are left deciding whether we choose to think of ourselves as being constantly replaced, or not - either choice won't contradict any empirical observations. My post was pointing out that if one does choose to think of things that way, a lot of other paradoxes seem to go away.

Comments for shorter Cold Takes pieces

I personally like Radiohead a lot, but I don't feel like my subjective opinions are generally important here; with Pet Sounds I tried to focus on what seemed like an unusually clear-cut case (not that the album has nothing interesting going on, but that it's an odd choice for #1 of all time, especially in light of coming out a year after A Love Supreme).

The Wicked Problem Experience

I think this is interesting and plausible, but I'm somewhat skeptical in light of the fact that there doesn't seem to have been much (or at least, very effective) outcry over the rollback of net neutrality.

Holden Karnofsky4y5

I think this is often a good approach!

Yonatan Cale

This is the kindest way anyone ever told me that I didn't help ;) <3 <3 If anyone's interested, I just posted about this idea yesterday: https://www.lesswrong.com/posts/8BGexmqqAx5Z2KFjW/how-to-make-your-article-more-persuasive-spoiler-do-user

Important, actionable research questions for the most important century

Holden Karnofsky4y6

I think "people can test their fit without much experience, but would get lots of value out of that experience for actually doing this work" is pretty valid, though I'll also comment that I think there are diminishing returns to direct experience - I think getting some experience (or at least exposure, e.g. via conversation with insiders) is important, but I don't think one necessarily needs several years inside key institutions in order to be helpful on problems like these.

Important, actionable research questions for the most important century

Consider trying the ELK contest (I am)

I don't have anything available for this offhand - I'd have to put serious thought into what questions are at the most productive intersection of "resolvable", "a good fit for Metaculus" and "capturing something important." Something about warning signs ("will an AI system steal at least $10 million?") could be good.

Thanks! I'd estimate another 10-15 hours on top of the above, so 20-30 hours total. A good amount of this felt like leisure time and could be done while not in front of a computer, which was nice. I didn't end up with "solutions" I'd be actually excited about for substantive progress on alignment, but I think I accomplished my goal of understanding the ELK writeup well enough to nitpick it.

The link works for me in incognito mode (it is a Google Drive file).

Jeremy

Huh, maybe someone else wants to weigh in? When I view in an incognito window, it prompts me to login. When I view it logged in, it says "You need access. Ask for access, or switch to an account with access." I'm not sure if you are the owner, but if so, you likely just need to click on "Share", then "Restricted" in the Get Link dialog (it doesn't really look like you can click there, but you can), then change the setting to "Anyone with the link".

Holden Karnofsky4y4

Thanks, this is helpful! I wasn't aware of that usage of "moral quasi-realism."

Personally, I find the question of whether principles can be described as "true" unimportant, and don't have much of a take on it. My default take is that it's convenient to sometimes use "true" in this way, so I sometimes do, while being happy to taboo it anytime someone wants me to or I otherwise think it would be helpful to.

I share a number of your intuitions as a starting point, but this dialogue (and previous ones) is intended to pose challenges to those intuitions. To follow up on those:

On Challenge 1A (and as a more general point) - if we take action against climate change, that presumably means making some sort of sacrifice today for the sake of future generations. Does your position imply that this is "simply better for some and worse for others, and not better or worse on the whole?" Does that imply that it is not particularly good or bad to take action on climate chan... (read more)

Erich_Grunewald 🔸

As you noticed, I limited the scope of the original comment to axiology (partly because moral theory is messier and more confusing to me), hence the handwaviness. Generally speaking, I trust my intuitions about axiology more than my intuitions about moral theory, because I feel like my intuition is more likely to "overfit" on more complicated and specific moral dilemmas than on more basic questions of value, or something in that vein. Anyway, I'll just preface the rest of this comment with this: I'm not very confident about all this and at any rate not sure whether deontology is the most plausible view. (I know that there are consequentialists who take person-affecting views too, but I haven't really read much about it. It seems weird to me because the view of value as tethered seems to resist aggregation, and it seems like you need to aggregate to evaluate and compare different consequences?) Since in deontology we can't compare two consequences and say which one is better, the answer depends on the action used to get there. I guess what matters is whether the action that brings about world X involves us doing or neglecting (or neither) the duties we have towards people in world X (and people alive now). Whether world X is good/bad for the population of world X (or for people alive today) only matters to the extent that it tells us something about our duties to those people. Example: Say we can do something about climate change either (1) by becoming benevolent dictators and implementing a carbon tax that way, or (2) by inventing a new travel simulation device, which reduces carbon emissions from flights but is also really addictive. (Assume the consequences of these two scenarios have equivalent expected utility, though I know the example is unfair since "dictatorship" sounds really bad -- I just couldn't think of a better one off the top of my head.) Here, I think the Kantian should reject (1) and permit or even recommend (2), roughly speaking because (2) resp

I think that's a fair point. These positions just pretty much end up in the same place when it comes to valuing existential risk.

That seems reasonable re: sentientism. I agree that there's no knockdown argument against lexicographic preferences, though I find them unappealing for reasons gestured at in this dialogue.

It's interesting that you have that intuition! I don't share it, and I think the intuition somewhat implies some of the "You shouldn't leave your house" type things alluded to in the dialogue.

Anthony DiGiovanni

I'm pretty happy to bite that bullet, especially since I'm not an egoist. I should still leave my house because others are going to suffer far worse (in expectation) if I don't do something to help, at some risk to myself. It does seem strange to say that if I didn't have any altruistic obligations then I shouldn't take very small risks of horrible experiences. But I have the stronger intuition that those horrible experiences are horrible in a way that the nonexistence of nice experiences isn't. And that "I" don't get to override the preference to avoid such experiences, when the counterfactual is that the preferences for the nice experiences just don't exist in the first place.