Existential risk
Existential risk
Discussions of risks which threaten the destruction of the long-term potential of life

Quick takes

73
4mo
9
AI Safety Needs To Get Serious About Chinese Political Culture I worry that Leopold Aschenbrenner's "China will use AI to install a global dystopia" take is based on crudely analogising the CCP to the USSR, or perhaps even to American cultural imperialism / expansionism, and isn't based on an even superficially informed analysis of either how China is currently actually thinking about AI, or what China's long term political goals or values are. I'm no more of an expert myself, but my impression is that China is much more interested in its own national security interests and its own ideological notions of the ethnic Chinese people and Chinese territory, so that beyond e.g. Taiwan there isn't an interest in global domination except to the extent that it prevents them being threatened by other expansionist powers. This or a number of other heuristics / judgements / perspectives could change substantially how we think about whether China would race for AGI, and/or be receptive to an argument that AGI development is dangerous and should be suppressed. China clearly has a lot to gain from harnessing AGI, but they have a lot to lose too, just like the West. Currently, this is a pretty superficial impression of mine, so I don't think it would be fair to write an article yet. I need to do my homework first: * I need to actually read Leopold's own writing about this, instead of making impressions based on summaries of it, * I've been recommended to look into what CSET and Brian Tse have written about China, * Perhaps there are other things I should hear about this, feel free to make recommendations. Alternatively, as always, I'd be really happy for someone who's already done the homework to write about this, particularly anyone specifically with expertise in Chinese political culture or international relations. Even if I write the article, all it'll really be able to be is an appeal to listen to experts in the field, or for one or more of those experts to step forwar
155
10mo
20
Mildly against the Longtermism --> GCR shift Epistemic status: Pretty uncertain, somewhat rambly TL;DR replacing longtermism with GCRs might get more resources to longtermist causes, but at the expense of non-GCR longtermist interventions and broader community epistemics Over the last ~6 months I've noticed a general shift amongst EA orgs to focus less on reducing risks from AI, Bio, nukes, etc based on the logic of longtermism, and more based on Global Catastrophic Risks (GCRs) directly. Some data points on this: * Open Phil renaming it's EA Community Growth (Longtermism) Team to GCR Capacity Building * This post from Claire Zabel (OP) * Giving What We Can's new Cause Area Fund being named "Risk and Resilience," with the goal of "Reducing Global Catastrophic Risks" * Longview-GWWC's Longtermism Fund being renamed the "Emerging Challenges Fund" * Anecdotal data from conversations with people working on GCRs / X-risk / Longtermist causes My guess is these changes are (almost entirely) driven by PR concerns about longtermism. I would also guess these changes increase the number of people donation / working on GCRs, which is (by longtermist lights) a positive thing. After all, no-one wants a GCR, even if only thinking about people alive today. Yet, I can't help but feel something is off about this framing. Some concerns (no particular ordering): 1. From a longtermist (~totalist classical utilitarian) perspective, there's a huge difference between ~99% and 100% of the population dying, if humanity recovers in the former case, but not the latter. Just looking at GCRs on their own mostly misses this nuance. * (see Parfit Reasons and Persons for the full thought experiment) 2. From a longtermist (~totalist classical utilitarian) perspective, preventing a GCR doesn't differentiate between "humanity prevents GCRs and realises 1% of it's potential" and "humanity prevents GCRs realises 99% of its potential" * Preventing an extinction-level GCR might move u
76
5mo
4
This is a cold take that’s probably been said before, but I thought it bears repeating occasionally, if only for the reminder: The longtermist viewpoint has gotten a lot of criticism for prioritizing “vast hypothetical future populations” over the needs of "real people," alive today. The mistake, so the critique goes, is the result of replacing ethics with math, or utilitarianism, or something cold and rigid like that. And so it’s flawed because it lacks the love or duty or "ethics of care" or concern for justice that lead people to alternatives like mutual aid and political activism. My go-to reaction to this critique has become something like “well you don’t need to prioritize vast abstract future generations to care about pandemics or nuclear war, those are very real things that could, with non-trivial probability, face us in our lifetimes.” I think this response has taken hold in general among people who talk about X-risk. This probably makes sense for pragmatic reasons. It’s a very good rebuttal to the “cold and heartless utilitarianism/pascal's mugging” critique. But I think it unfortunately neglects the critical point that longtermism, when taken really seriously — at least the sort of longtermism that MacAskill writes about in WWOTF, or Joe Carlsmith writes about in his essays — is full of care and love and duty. Reading the thought experiment that opens the book about living every human life in sequential order reminded me of this. I wish there were more people responding to the “longtermism is cold and heartless” critique by making the case that no, longtermism at face value is worth preserving because it's the polar opposite of heartless. Caring about the world we leave for the real people, with emotions and needs and experiences as real as our own, who very well may inherit our world but who we’ll never meet, is an extraordinary act of empathy and compassion — one that’s way harder to access than the empathy and warmth we might feel for our neighbors
7
8d
I'm exploring the possibility of building an alignment research organization focused on augmenting alignment researchers and progressively automating alignment research (yes, I have thought deeply about differential progress and other concerns). I intend to seek funding in the next few months, and I'd like to chat with people interested in this kind of work, especially great research engineers and full-stack engineers who might want to cofound such an organization. If you or anyone you know might want to chat, let me know! Send me a DM, and I can send you some initial details about the organization's vision. Here are some things I'm looking for in potential co-founders: Need * Strong software engineering skills Nice-to-have * Experience in designing LLM agent pipelines with tool-use * Experience in full-stack development * Experience in scalable alignment research approaches (automated interpretability/evals/red-teaming)
17
23d
1
It seems like some of the biggest proponents of SB 1047 are Hollywood actors & writers (ex. Mark Ruffalo)—you might remember them from last year’s strike. I think that the AI Safety movement has a big opportunity to partner with organised labour the way the animal welfare side of EA partnered with vegans. These are massive organisations with a lot of weight and mainstream power if we can find ways to work with them; it’s a big shortcut to building serious groundswell rather than going it alone. See also Yanni’s work with voice actors in Australia—more of this!
50
3mo
2
The recently released 2024 Republican platform said they'll repeal the recent White House Executive Order on AI, which many in this community thought is a necessary first step to make future AI progress more safe/secure. This seems bad. From https://s3.documentcloud.org/documents/24795758/read-the-2024-republican-party-platform.pdf, see bottom of pg 9.
39
2mo
4
I just read Stephen Clare's 80k excellent article about the risks of stable totalitarianism.  I've been interested in this area for some time (though my focus is somewhat different) and I'm really glad more people are working on this.  In the article, Stephen puts the probability that a totalitarian regime will control the world indefinitely at about 1 in 30,000. My probability on a totalitarian regime controlling a non-trivial fraction of humanity's future is considerably higher (though I haven't thought much about this). One point of disagreement may be the following. Stephen writes:  This is not clear to me. Stephen most likely understands the relevant topics way more than myself but I worry that autocratic regimes often seem to cooperate. This has happened historically—e.g., Nazi Germany, fascist Italy, and Imperial Japan—and also seems to be happening today. My sense is that Russia, China, Venezuela, Iran, and North Korea seem to have formed some type of loose alliance, at least to some extent (see also Anne Applebaum's Autocracy Inc.). Perhaps, this doesn't apply to strictly totalitarian regimes (though it did so for Germany, Italy and Japan in the 1940s).  Autocratic regimes control a non-trivial fraction (like 20-25%?) of World GDP. A naive extrapolation could thus suggest that some type of coalition of autocratic regimes will control 20-25% of humanity's future (assuming these regimes won't reform themselves). Depending on the offense-defense balance (and depending on how people trade off reducing suffering/injustive against other values such as national sovereignty, non-interference, isolationism, personal costs to themselves, etc.), this arrangement may very well persist.  It's unclear how much suffering such regimes would create—perhaps there would be fairly little; e.g. in China, ignoring political prisoners, the Uyghurs, etc., most people are probably doing fairly well (though a lot of people in, say, Iran aren't doing too well, see more below).
6
7d
I quickly wrote up some rough project ideas for ARENA and LASR participants, so I figured I'd share them here as well. I am happy to discuss these ideas and potentially collaborate on some of them. Alignment Project Ideas (Oct 2, 2024) 1. Improving "A Multimodal Automated Interpretability Agent" (MAIA) Overview MAIA (Multimodal Automated Interpretability Agent) is a system designed to help users understand AI models by combining human-like experimentation flexibility with automated scalability. It answers user queries about AI system components by iteratively generating hypotheses, designing and running experiments, observing outcomes, and updating hypotheses. MAIA uses a vision-language model (GPT-4V, at the time) backbone equipped with an API of interpretability experiment tools. This modular system can address both "macroscopic" questions (e.g., identifying systematic biases in model predictions) and "microscopic" questions (e.g., describing individual features) with simple query modifications. This project aims to improve MAIA's ability to either answer macroscopic questions or microscopic questions on vision models. 2. Making "A Multimodal Automated Interpretability Agent" (MAIA) work with LLMs MAIA is focused on vision models, so this project aims to create a MAIA-like setup, but for the interpretability of LLMs. Given that this would require creating a new setup for language models, it would make sense to come up with simple interpretability benchmark examples to test MAIA-LLM. The easiest way to do this would be to either look for existing LLM interpretability benchmarks or create one based on interpretability results we've already verified (would be ideal to have a ground truth). Ideally, the examples in the benchmark would be simple, but new enough that the LLM has not seen them in its training data. 3. Testing the robustness of Critique-out-Loud Reward (CLoud) Models Critique-out-Loud reward models are reward models that can reason explici
Load more (8/99)