Andrew Critch

542Joined Dec 2018


Consequentialists (in society) should self-modify to have side constraints

I'm very pleased to see this line of reasoning being promoted.   Mutual transparency of agents (information permeability of boundaries) is a really important feature of & input to real-world ethics; thanks for expounding on it!

My Most Likely Reason to Die Young is AI X-Risk

Thanks for calling this out, whoever you are.   This is another reason why, for the last ~5 years or so, I've been pushing for the terminology "AI x-risk" and "AI x-safety" to replace "(long-term) AI risk" and "(long-term) AI safety".  For audiences not familiar with the "x" or "existential" terminology, one can say "large scale risk" to point the large stakes, rather than always saying "long term".

(Also, the fact that I don't know who you are is actually fairly heartening :)

Simplify EA Pitches to "Holy Shit, X-Risk"

Neel, I agree with this sentiment, provided that it does tot lead to extremist actions to prevent x-risk (see

Specifically, I agree that we should be explicit about existential safety — and in particular, AI existential safety — as a broadly agreeable and understandable cause area that does not depend on EA, longtermism, or other niche communities/stances.  This is main reason AI Research Considerations for Human Existential Safety (ARCHES;  is explicitly about existential safety, rather than "AI safety' or other euphemistic / dog-whistley terms.

"Long-Termism" vs. "Existential Risk"

Scott, thanks so much for this post.  It's been years coming in my opinion.  FWIW, my reason for making ARCHES (AI Research Considerations for Human Existential Safety) explicitly about existential risk, and not about "AI safety" or some other glomarization, is that I think x-risk and x-safety are not long-term/far-off concerns that can be procrastinated away.  (with David Krueger)

Ideally, we need to engage as many researchers as possible, thinking about as many aspects of a functioning civilization as possible, to assess how A(G)I can creep into those corners of civilization and pose an x-risk, with cybersecurity / internet infrastructure and social media being extremely vulnerable fronts that are easily salient today.  

As I say this, I worry that other EAs will get worried that talking to folks working on cybersecurity or recommender systems necessarily means abandoning existential risk as a priority, because those fields have not historically taken x-risk seriously.   

However, for better or for worse, it's becoming increasingly easy for everyone to imagine cybersecurity and/or propaganda disasters involving very powerful AI systems, such that x-risk is increasingly not-a-stretch-for-the-imagination.  So, I'd encourage anyone who feels like "there is no hope to convince [group x] to care" to start re-evaluating that position (e.g., rather than aiming/advocating for drastic interventions like invasive pivotal acts).  I can't tell whether or not you-specifically are in the "there is no point in trying" camp, but others might be,  and in any case I thought it might be good to bring up

In summary: as tech gets scarier, we should have some faith that people will be more amenable to arguments that it is in fact dangerous, and re-examine whether this-group or that-group is worth engaging on the topic of existential safety as a near-term priority.

“Pivotal Act” Intentions: Negative Consequences and Fallacious Arguments

the way I now think about these scenarios is that there's a tradeoff between technical ability and political ability

I also like this, and appreciate you pointing out a tradeoff where the discouse was presenting an either-or decision. I'd actually considered a follow-up post on the pareto boundary between unilaterally maximizing (altruistic) utility and multilaterally preserving coordination boundaries and consent norms. Relating your ontology to mine, I'd say that in the AGI arena, technical ability contributes more to the former (unilaterally maximizing...) than the latter (multilaterally preserving...), and political ability contributes more to the latter than the former.

“Pivotal Act” Intentions: Negative Consequences and Fallacious Arguments

saying "this test isn't legitimate" feels like a conceptual smudge that tries to blend all those claims together, as if each implied all of the others.

This is my favorite piece of feedback on this post so far, and I agree with it; thanks!

To clarify what I meant, I've changed the text to read "making evidence available to support global regulatory efforts from a broader base of consensual actors (see Part 3)."