Executive Director @ Berkeley Existential Risk Initiative
1066 karmaJoined Working (6-15 years)New York, NY, USA


  • Attended an EA Global conference
  • Attended more than three meetings with a local EA group
  • Received career coaching from 80,000 Hours


Topic contributions

I think people would say that the dog was stronger and faster than all previous dog breeds, not that it was "more capable". It's in fact significantly less capable at not attacking its owner, which is an important dog capability. I just think the language of "capability" is somewhat idiosyncratic to AI research and industry, and I'm arguing that it's not particularly useful or clarifying language.

More to my point (though probably orthogonal to your point), I don't think many people would buy this dog, because most people care more about not getting attacked than they do about speed and strength.

As a side note, I don't see why preferences and goals change any of this. I'm constantly hearing AI (safety) researchers talk about "capabilities research" on today's AI systems, but I don't think most of them think those systems have their own preferences and goals. At least not in the sense that a dog has preferences or goals. I just think it's a word that AI [safety?] researchers use, and I think it's unclear and unhelpful language.


What is "capabilities"? What is "safety"? People often talk about the alignment tax: the magnitude of capabilities/time/cost a developer loses by implementing an aligned/safe system. But why should we consider an unaligned/unsafe system "capable" at all? If someone developed a commercial airplane that went faster than anything else on the market, but it exploded on 1% of flights, no one would call that a capable airplane.

This idea overlaps with safety culture and safety engineering and is not new. But alongside recent criticism of the terms "safety" and "alignment", I'm starting to think that the term "capabilities" is unhelpful, capturing different things for different people.

I played the paperclips game 6-12 months before reading Superintelligence (which is what convinced me to prioritize AI x-risk), and I think the game made these ideas easier for me to understand and internalize.

This is truly crushing news. I met Marisa at a CFAR workshop in 2020. She was open, kind, and grateful to everyone, and it was joyful to be around her. I worked with her a bit revitalizing the EA Operations Slack Workspace in 2020, and had only had a few conversations with her since then, here and there at EA events. Marisa (like many young EAs) made me excited for a future that would benefit from her work, ambition, and positivity. Now she's gone. She was a good person, I'm glad she was alive, and I am so sad she's gone.

Good reasoning, well written. Reading this post convinced me to join the next NYC protest. Unfortunately I missed the one literally two days ago because I waited too long to read this. But I plan to be there in September.


One thing I think is often missing from these sorts of conversations is that "alignment with EA" and "alignment with my organization's mission" are not the same thing! It's a mistake to assume that the only people who understand and believe in your organization’s mission are members of the effective altruism community. EA ideas don’t have to come in a complete package. People can believe that one organization’s mission is really valuable and important, for different reasons, coming from totally different values, and without also believing that a bunch of other EA organizations are similarly valuable.

For "core EA" orgs like the Centre for Effective Altruism[1], there's probably near-total overlap between these two things. But for lots of other organizations the overlap is only incidental, and what you should really be looking for is "alignment with my organization's mission". Perceived EA Alignment is an unpredictable measure of that, while also being correlated with a bunch of other things like culture, thinking style, network, and socioeconomic status, each of which you either don't care about or which you don't want to be selecting for in the first place.

  1. ^

Within EA, work on x-risk is very siloed by type of threat: There are the AI people, the bio people, etc. Is this bad, or good?

Which of these is the correct analogy?

  1. "Biology is to science as AI safety is to x-risk," or 
  2. "Immunology is to biology as AI safety is to x-risk"

EAs seem to implicitly think analogy 1 is correct: some interdisciplinary work is nice (biophysics) but most biologists can just be biologists (i.e. most AI x-risk people can just do AI).

The "existential risk studies" model (popular with CSER, SERI, and lots of other non-EA academics) seems to think that analogy 2 is correct, and that interdisciplinary work is totally critical—immunologists alone cannot achieve a useful understanding of the entire system they're trying to study, and they need to exchange ideas with other subfields of medicine/biology in order to have an impact, i.e. AI x-risk workers are missing critical pieces of the puzzle when they neglect broader x-risk studies.

I agree with your last sentence, and I think in some versions of this it's the vast majority of people. A lot of charity advertising seems to encourage a false sense of confidence, e.g. "Feed this child for $1," or "adopt this manatee". I think this makes use of a near-universal human bias which probably has a name but which I am not recalling at the moment. For a less deceptive version of this, note how much effort AMF and GiveDirectly seem to have put in into tracking the concrete impact of your specific donation.


Building off of Jason's comment: Another way to express this is that comparing directly to the $5,500 Givewell bar is only fair for risk-neutral donors (I think?). Most potential donors are not really risk neutral, and would rather spend $5,001 to definitely save one life than $5,000 to have a 10% chance of saving 10 lives. Risk neutrality is a totally defensible position, but so is non-neutrality. It's good to have the option of paying a "premium" for a higher confidence (but lower risk-neutral EV).

Leaving math mode...I love this post. It made me emotional and also made me think, and it feels like a really central example of what EA should be about. I'm very impressed by your resolve here in following through with this plan, and I'm really glad to have people like you in this community.

Very nice post. "Anarchists have no idols" strikes me as very similar to the popular anarchist slogan, "No gods, no masters." Perhaps the person who said it to you was riffing on that?

Load more