Recent discussion

In the most important century series, I argued that the 21st century could be the most important century ever for humanity, via the development of advanced AI systems that could dramatically speed up scientific and technological advancement, getting us more quickly than most people imagine to a deeply unfamiliar future.

In this more recent series, I’ve been trying to help answer this question: “So what? What can I do to help?”

So far, I’ve just been trying to build a picture of some of the major risks we might face (especially the risk of misaligned AI that could defeat all of humanity), what might be challenging about these risks, and why we might succeed anyway. Now I’ve finally gotten to the part where I can start laying out...

Just noting that many of the “this concept is properly explained elsewhere” links are also accompanied by expandable boxes that you can click to expand for the gist. I do think that understanding where I’m coming from in this piece requires a bunch of background, but I’ve tried to make it as easy on readers as I could, e.g. explaining each concept in brief and providing a link if the brief explanation isn’t clear enough or doesn’t address particular objections.

2Holden Karnofsky7m
Noting that I’m now going back through posts responding to comments, after putting off doing so for months - I generally find it easier to do this in bulk to avoid being distracted from my core priorities, though this time I think I put it off longer than I should’ve. It is generally true that my participation in comments is extremely sporadic/sparse, and folks should factor that into curation decisions.

In November, I wrote about Open Philanthropy’s soft pause of new longtermist funding commitments:

We will have to raise our bar for longtermist grantmaking: with more funding opportunities that we’re choosing between, we’ll have to fund a lower percentage of them. This means grants that we would’ve made before might no longer be made, and/or we might want to provide smaller amounts of money to projects we previously would have supported more generously ..

Open Philanthropy also need[s] to raise its bar in light of general market movements (particularly the fall in META stock) and other factors ... the longtermist community has been growing; our rate of spending has been going up; and we expect both of these trends to


I expect more funding discontinuations than usual, but we generally try to discontinue funding in a way that gives organizations time to plan around the change.

I’m not leading the longer-term process. I expect Open Philanthropy will publish content about it, but I’m not sure when.

2Holden Karnofsky11m
I don’t have a good answer, sorry. The difficulty of getting cardinal estimates for longtermist grants is a lot of what drove our decision to go with an ordinal approach instead.
2Holden Karnofsky12m
Aiming to spend down in less than 20 years would not obviously be justified even if one’s median for transformative AI timelines were well under 20 years. This is because we may want extra capital in a “crunch time” where we’re close enough to transformative AI for the strategic picture to have become a lot clearer, and because even a 10-25% chance of longer timelines would provide some justification for not spending down on short time frames. This move could be justified if the existing giving opportunities were strong enough even with a lower bar. That may end up being the case in the future. But we don’t feel it’s the case today, having eyeballed the stack rank.

In previous pieces, I argued that there's a real and large risk of AI systems' aiming to defeat all of humanity combined - and succeeding.

I first argued that this sort of catastrophe would be likely without specific countermeasures to prevent it. I then argued that countermeasures could be challenging, due to some key difficulties of AI safety research.

But while I think misalignment risk is serious and presents major challenges, I don’t agree with sentiments[1] along the lines of “We haven’t figured out how to align an AI, so if transformative AI comes soon, we’re doomed.” Here I’m going to talk about some of my high-level hopes for how we might end up avoiding this risk.

I’ll first recap the challenge, using Ajeya Cotra’s young businessperson analogy...

My point with the observation you quoted wasn't "This would be unprecedented, therefore there's a very low prior probability." It was more like: "It's very hard to justify >90% confidence on anything without some strong base rate to go off of. In this case, we have no base rate to go off of; we're pretty wildly guessing." I agree something weird has to happen fairly "soon" by zoomed-out historical standards, but there are many possible candidates for what the weird thing is (I also endorse dsj's comment below).

Sign up for the Forum's email digest
You'll get a weekly email with the best posts from the past week. The Forum team selects the posts to feature based on personal preference and Forum popularity, and also adds some announcements and a classic post.


EA Wellington (EAW) in New Zealand, has been iterating on the same basic stall idea since 2021, we have found this low cost method to be effective at encouraging engagement at the stall, with mixed results in people then turning up to our regular events, and unknown impact on people’s long term thinking.

This post outlines how we ran the stall at a recent Clubs’ Day at our local university, including the thinking behind some of our decisions, what we learnt from the process, and some things you might want to consider if you’re running a stall.

This post was written by me (Tom) with feedback from the rest of the EAW exec. I use “we” a lot, and when expressing opinions this generally means “I think this and...

Thanks for sharing—this seems like a good strategy. I'm curious what people said when you asked whether they had heard of EA; like, what percent had, and of those, what percent had a positive/neutral/negative impression?

Thank you for going through the effort of writing this up! Ditto this experience of a successful stall (we've also been iterating on a set up similar to yours) but difficulty translating that into regular event attendance (about a handful). EA UNSW (Australia). New member influx tends to come from catching folks who have discovered EA via internet or through event collaborations with other related societies. Tabling at the start of the year has caught such EA internet lurkers.

Note: manually cross-posted from LessWrong. See here for discussion on LW.


I recently watched Eliezer Yudkowsky's appearance on the Bankless podcast, where he argued that AI was nigh-certain to end humanity. Since the podcast, some commentators have offered pushback against the doom conclusion. However, one sentiment I saw was that optimists tended not to engage with the specific arguments pessimists like Yudkowsky offered. 

Economist Robin Hanson points out that this pattern is very common for small groups which hold counterintuitive beliefs: insiders develop their own internal language, which skeptical outsiders usually don't bother to learn. Outsiders then make objections that focus on broad arguments against the belief's plausibility, rather than objections that focus on specific insider arguments.

As an AI "alignment insider" whose current estimate of doom is around 5%,...

GiveWell and Open Philanthropy

GiveWell and Open Philanthropy are sister organizations in the effective altruism community. Both seek to identify outstanding giving opportunities, but they use different criteria and processes. 

GiveWell has an emphasis on evidence-backed organizations within the global health and wellbeing space, while Open Philanthropy also supports high-risk, high-reward work, as well as work that could take a long time to pay off, in a variety of cause areas. We think this illustrates interesting methodological differences between attempts to answer the question “How can we do the most good?”.


The Moral Imperative toward Cost-Effectiveness in Global Health - Centre for Global Development  (20 mins.)

Hi! This link is broken. Could someone update it? Here's the new one:

"We don’t usually think of achievements in terms of what would have happened otherwise, but we should. What matters is not who does good but whether good is done; and the measure of how much good you achieve is the difference between what happens as a result of your actions and what would have happened anyway." - William MacAskill, Doing Good Better


Counterfactual thinking is fundamental to economic thinking, and this approach has been incorporated into Effective Altruism. The actual impact of your choices is based on what changed, not what happened.  Per the forum wiki summary, "Counterfactual reasoning involves scenarios that will occur if an agent chooses a certain action, or that would have occurred if an agent had chosen an action they did not."

In this post,...

Do you mean maximizing the sum of Shapley values or just your own Shapley value? I had the latter in mind. I might be mistaken about the specific perverse examples even under that interpretation, since I'm not sure how Shapley values are meant to be used. Maximizing your own Shapley value seems to bring in a bunch of counterfactuals (i.e. your counterfactual contribution to each possible coalition) and weigh them ignoring propensities to cooperate/coordinate, though.

On the other hand, the sum of Shapley values is just the value (your utility?) of the "gran... (read more)

I'm torn with this post as while I agree with the overall spirit (that EAs can do better at cooperation and counterfactuals, be more prosocial), I think the post makes some strong claims/assumptions which I disagree with. I find it problematic that these assumptions are stated like they are facts. First, EA may be better at "internal" cooperation than other groups, but cooperation is hard and internal EA cooperation is far from perfect. Second, the idea that correctly assessed counterfactual impact is hyperopic. Nope, hyperopic assessments are just a sign of not getting your counterfactual right. Third, the idea that Shapley values are the solution. I like Shapley values but only within the narrow constraints for which they are well specified. That is, environments where cooperation should inherently be possible: when all agents agree on the value that is being created. In general you need an approach that can hand both cooperative and adversial environments and everything in between. I'd call that general approach counterfactual impact. I see another commentor has noted Toby's old comments about this and I'll second that. Finally, economists may do more counterfactual reasoning than other groups but that doesn't mean they have it all figured out. Ask your average economist to quickly model a counterfactual and it could easily end up being as myopic or hyperopic too. The solution is really to get all analysts better trained on heuristics for reasoning about counterfactuals in a way that is prosocial. To me that is what you get to if you try to implement philosophies like Toby's global consequentialism. But we need more practical work on things like this, not repetitive claims about Shapley values. I'm writing quickly and hope this comes across in the right spirit. I do find the strong claims in this post frustrating to see, but I welcome that you raised the topic.
If the value of success is X, and the cost of each group pursuing the intervention is Y, then ideally we would want to pick N (the number of groups that will pursue the intervention) from the possible values 0,1,2 or 3, so as to maximize: (1-(1/2)^N) X - N Y i.e., to maximize expected value. If all 3 groups have the same goals, they'll all agree what N is. If N is not 0 or 3, then the best thing for them to do is to get together and decide which of them will pursue the intervention, and which of them won't, in order to get the optimum N. They can base their decision of how to allocate the groups on secondary factors (or by chance if everything else really is equal). If they all have the same goals then there's no game theory here. They'll all be happy with this, and they'll all be maximizing their own individual counterfactual expected value by taking part in this coordination. This is what I mean by coordination. The fact that their individual approaches are different is irrelevant to them benefiting from this form of coordination. 'Maximize Shapley value' will perform worse than this strategy. For example, suppose X is 8, Y is 2. The optimum value of N for expected value is then 2 (2 groups pursue intervention, 1 doesn't). But using Shapley values, I think you find that whatever N is, the Shapley value of your contribution is always >2. So whatever every other group is doing, each group should decide to take part, and we then end up at N=3, which is sub-optimal.

Or alternatively: “A message for 2018-me: quit your job and start doing ‘EA work’”

2022/06/29 update since writing this: Claire Boine and I have made a website advertising career advice for mid-career people. Please consider applying for advice  through there if this article resonates with you! (The advice will be from either me or Claire.)

Note: I wrote this post a bit quickly (~9 hours) so it’s a little rough, and likely contains minor errors.

In this post I want to provide encouragement and information for mid-career people[1] who are sympathetic to EA ideas but haven’t seriously tried doing EA work. Basically, I think there’s tons of amazingly impactful, fun, well-compensated work that skilled mid-career people could do, and that this is maybe much less obvious from the “outside” than it is...

With this post almost a year old now, I was curious if any of the commenters who were interested in switching to EA-related work have pursued this route. If so:

  • Have you been hired?
  • What was the job seeking process like for you?
  • Any recommendations to other mid-career professionals looking to pursue this path?

I feel very warmly about using relatively quick estimates to carry out sanity checks, i.e., to quickly check whether something is clearly off, whether some decision is clearly overdetermined, or whether someone is just bullshitting. This is in contrast to Fermi estimates, which aim to arrive at an estimate for a quantity of interest, and which I also feel warmly about but which aren’t the subject of this post. In this post, I explain why I like quantitative sanity checks so much, and I give some examples.

Why I like this so much

I like this so much because:

  • It is very defensible. There are some cached arguments against more quantified estimation, but sanity checking cuts through most—if not all—of them. “Oh, well, I just think that estimation has some