Discussions of risks which threaten the destruction of the long-term potential of life

This is a cold take that’s probably been said before, but I thought it bears repeating occasionally, if only for the reminder: The longtermist viewpoint has gotten a lot of criticism for prioritizing “vast hypothetical future populations” over the needs of "real people," alive today. The mistake, so the critique goes, is the result of replacing ethics with math, or utilitarianism, or something cold and rigid like that. And so it’s flawed because it lacks the love or duty or "ethics of care" or concern for justice that lead people to alternatives like mutual aid and political activism. My go-to reaction to this critique has become something like “well you don’t need to prioritize vast abstract future generations to care about pandemics or nuclear war, those are very real things that could, with non-trivial probability, face us in our lifetimes.” I think this response has taken hold in general among people who talk about X-risk. This probably makes sense for pragmatic reasons. It’s a very good rebuttal to the “cold and heartless utilitarianism/pascal's mugging” critique. But I think it unfortunately neglects the critical point that longtermism, when taken really seriously — at least the sort of longtermism that MacAskill writes about in WWOTF, or Joe Carlsmith writes about in his essays — is full of care and love and duty. Reading the thought experiment that opens the book about living every human life in sequential order reminded me of this. I wish there were more people responding to the “longtermism is cold and heartless” critique by making the case that no, longtermism at face value is worth preserving because it's the polar opposite of heartless. Caring about the world we leave for the real people, with emotions and needs and experiences as real as our own, who very well may inherit our world but who we’ll never meet, is an extraordinary act of empathy and compassion — one that’s way harder to access than the empathy and warmth we might feel for our neighbors
I wonder how the recent turn for the worse at OpenAI should make us feel about e.g. Anthropic and Conjecture and other organizations with a similar structure, or whether we should change our behaviour towards those orgs. * How much do we think that OpenAI's problems are idiosyncratic vs. structural? If e.g. Sam Altman is the problem, we can still feel good about peer organisations. If instead weighing investor concerns and safety concerns is the root of the problem, we should be worried about whether peer organizations are going to be pushed down the same path sooner or later. * Are there any concerns we have with OpenAI that we should be taking this opportunity to put to its peers as well? For example, have peers been publically asked if they use non-disparagement agreements? I can imagine a situation where another org has really just never thought to use them, and we can use this occasion to encourage them to turn that into a public commitment.
We should expect that the incentives and culture for AI-focused companies to make them uniquely terrible for producing safe AGI.    From a “safety from catastrophic risk” perspective, I suspect an “AI-focused company” (e.g. Anthropic, OpenAI, Mistral) is abstractly pretty close to the worst possible organizational structure for getting us towards AGI. I have two distinct but related reasons: 1. Incentives 2. Culture From an incentives perspective, consider realistic alternative organizational structures to “AI-focused company” that nonetheless has enough firepower to host successful multibillion-dollar scientific/engineering projects: 1. As part of an intergovernmental effort (e.g. CERN’s Large Hadron Collider, the ISS) 2. As part of a governmental effort of a single country (e.g. Apollo Program, Manhattan Project, China’s Tiangong) 3. As part of a larger company (e.g. Google DeepMind, Meta AI) In each of those cases, I claim that there are stronger (though still not ideal) organizational incentives to slow down, pause/stop, or roll back deployment if there is sufficient evidence or reason to believe that further development can result in major catastrophe. In contrast, an AI-focused company has every incentive to go ahead on AI when the case for pausing is uncertain, and minimal incentive to stop or even take things slowly.  From a culture perspective, I claim that without knowing any details of the specific companies, you should expect AI-focused companies to be more likely than plausible contenders to have the following cultural elements: 1. Ideological AGI Vision AI-focused companies may have a large contingent of “true believers” who are ideologically motivated to make AGI at all costs and 2. No Pre-existing Safety Culture AI-focused companies may have minimal or no strong “safety” culture where people deeply understand, have experience in, and are motivated by a desire to avoid catastrophic outcomes.  The first one should be self-explanatory. Th
Mildly against the Longtermism --> GCR shift Epistemic status: Pretty uncertain, somewhat rambly TL;DR replacing longtermism with GCRs might get more resources to longtermist causes, but at the expense of non-GCR longtermist interventions and broader community epistemics Over the last ~6 months I've noticed a general shift amongst EA orgs to focus less on reducing risks from AI, Bio, nukes, etc based on the logic of longtermism, and more based on Global Catastrophic Risks (GCRs) directly. Some data points on this: * Open Phil renaming it's EA Community Growth (Longtermism) Team to GCR Capacity Building * This post from Claire Zabel (OP) * Giving What We Can's new Cause Area Fund being named "Risk and Resilience," with the goal of "Reducing Global Catastrophic Risks" * Longview-GWWC's Longtermism Fund being renamed the "Emerging Challenges Fund" * Anecdotal data from conversations with people working on GCRs / X-risk / Longtermist causes My guess is these changes are (almost entirely) driven by PR concerns about longtermism. I would also guess these changes increase the number of people donation / working on GCRs, which is (by longtermist lights) a positive thing. After all, no-one wants a GCR, even if only thinking about people alive today. Yet, I can't help but feel something is off about this framing. Some concerns (no particular ordering): 1. From a longtermist (~totalist classical utilitarian) perspective, there's a huge difference between ~99% and 100% of the population dying, if humanity recovers in the former case, but not the latter. Just looking at GCRs on their own mostly misses this nuance. * (see Parfit Reasons and Persons for the full thought experiment) 2. From a longtermist (~totalist classical utilitarian) perspective, preventing a GCR doesn't differentiate between "humanity prevents GCRs and realises 1% of it's potential" and "humanity prevents GCRs realises 99% of its potential" * Preventing an extinction-level GCR might move u
Two sources of human misalignment that may resist a long reflection: malevolence and ideological fanaticism (Alternative title: Some bad human values may corrupt a long reflection[1]) The values of some humans, even if idealized (e.g., during some form of long reflection), may be incompatible with an excellent future. Thus, solving AI alignment will not necessarily lead to utopia. Others have raised similar concerns before.[2] Joe Carlsmith puts it especially well in the post “An even deeper atheism”: What makes human hearts bad?  What, exactly, makes some human hearts bad drivers? If we better understood what makes hearts go bad, perhaps we could figure out how to make bad hearts good or at least learn how to prevent hearts from going bad. It would also allow us better spot potentially bad hearts and coordinate our efforts to prevent them from taking the driving seat. As of now, I’m most worried about malevolent personality traits and fanatical ideologies.[3] Malevolence: dangerous personality traits Some human hearts may be corrupted due to elevated malevolent traits like psychopathy, sadism, narcissism, Machiavellianism, or spitefulness. Ideological fanaticism: dangerous belief systems There are many suitable definitions of “ideological fanaticism”. Whatever definition we are going to use, it should describe ideologies that have caused immense harm historically, such as fascism (Germany under Hitler, Italy under Mussolini), (extreme) communism (the Soviet Union under Stalin, China under Mao), religious fundamentalism (ISIS, the Inquisition), and most cults.  See this footnote[4] for a preliminary list of defining characteristics. Malevolence and fanaticism seem especially dangerous Of course, there are other factors that could corrupt our hearts or driving ability. For example, cognitive biases, limited cognitive ability, philosophical confusions, or plain old selfishness.[5] I’m most concerned about malevolence and ideological fanaticism for two r
(COI note: I work at OpenAI. These are my personal views, though.) My quick take on the "AI pause debate", framed in terms of two scenarios for how the AI safety community might evolve over the coming years: 1. AI safety becomes the single community that's the most knowledgeable about cutting-edge ML systems. The smartest up-and-coming ML researchers find themselves constantly coming to AI safety spaces, because that's the place to go if you want to nerd out about the models. It feels like the early days of hacker culture. There's a constant flow of ideas and brainstorming in those spaces; the core alignment ideas are standard background knowledge for everyone there. There are hackathons where people build fun demos, and people figuring out ways of using AI to augment their research. Constant interactions with the models allows people to gain really good hands-on intuitions about how they work, which they leverage into doing great research that helps us actually understand them better. When the public ends up demanding regulation, there's a large pool of competent people who are broadly reasonable about the risks, and can slot into the relevant institutions and make them work well. 2. AI safety becomes much more similar to the environmentalist movement. It has broader reach, but alienates a lot of the most competent people in the relevant fields. ML researchers who find themselves in AI safety spaces are told they're "worse than Hitler" (which happened to a friend of mine). People get deontological about AI progress; some hesitate to pay for ChatGPT because it feels like they're contributing to the problem (another true story); others overemphasize the risks of existing models in order to whip up popular support. People are sucked into psychological doom spirals similar to how many environmentalists think about climate change: if you're not depressed then you obviously don't take it seriously enough. Just like environmentalists often block some of the most valua
I spent way too much time organizing my thoughts on AI loss-of-control ("x-risk") debates without any feedback today, so I'm publishing perhaps one of my favorite snippets/threads: A lot of debates seem to boil down to under-acknowledged and poorly-framed disagreements about questions like “who bears the burden of proof.” For example, some skeptics say “extraordinary claims require extraordinary evidence” when dismissing claims that the risk is merely “above 1%”, whereas safetyists argue that having >99% confidence that things won’t go wrong is the “extraordinary claim that requires extraordinary evidence.”  I think that talking about “burdens” might be unproductive. Instead, it may be better to frame the question more like “what should we assume by default, in the absence of definitive ‘evidence’ or arguments, and why?” “Burden” language is super fuzzy (and seems a bit morally charged), whereas this framing at least forces people to acknowledge that some default assumptions are being made and consider why.  To address that framing, I think it’s better to ask/answer questions like “What reference class does ‘building AGI’ belong to, and what are the base rates of danger for that reference class?” This framing at least pushes people to make explicit claims about what reference class building AGI belongs to, which should make it clearer that it doesn’t belong in your “all technologies ever” reference class.  In my view, the "default" estimate should not be “roughly zero until proven otherwise,” especially given that there isn’t consensus among experts and the overarching narrative of “intelligence proved really powerful in humans, misalignment even among humans is quite common (and is already often observed in existing models), and we often don’t get technologies right on the first few tries.”
Y-Combinator wants to fund Mechanistic Interpretability startups "Understanding model behavior is very challenging, but we believe that in contexts where trust is paramount it is essential for an AI model to be interpretable. Its responses need to be explainable. For society to reap the full benefits of AI, more work needs to be done on explainable AI. We are interested in funding people building new interpretable models or tools to explain the output of existing models." Link https://www.ycombinator.com/rfs (Scroll to 12) What they look for in startup founders https://www.ycombinator.com/library/64-what-makes-great-founders-stand-out
