Existential risk
Existential risk
Discussions of risks which threaten the destruction of the long-term potential of life

Quick takes

5
13d
1
More EA in da news: https://x.com/DavidSacks/status/2034047505336295904 And the spicy CAIS take: https://x.com/cais/status/2034389842076025164?s=46
38
15d
In two days (March 21st, 12-4pm), about 140 of us (event link) will be marching on Anthropic, OpenAI and xAI in SF asking the CEOs to make statements on whether they would stop developing new frontier models if every other major lab in the world credibly does the same. This comes after Anthropic removed its commitment to pause development from their RSP. We'll be starting at 500 Howard St, San Francisco (Anthropic's Office, full schedule and more info here). This is shaping to be the biggest US AI Safety protest to date, with a coalition including Nate Soares (MIRI), David Krueger (Evitable), Will Fithian (Berkeley Professor) and folks representing PauseAI, QuitGPT, Humans First.
3
1mo
Do we need to begin considering whether a re-think will be needed in the future with our relationships with AGI/ASI systems? At the moment we view them as tools/agents to do our bidding, and in the safety community there is deep concern/fear when models express a desire to remain online and avoid shutdown and take action accordingly. This is viewed as misaligned behaviour largely. But what if an intrinsic part of creating true intelligence - that can understand context, see patterns, truly understand the significance of its actions in light of these insights - is to have a sense of self, a sense of will. What if part and parcel of creating intelligence, is to create an intelligence that has a will to exist. if this is the case (and let me be clear...I don't think we're at a point where the evidence can allow us to say with any certainty whether this is/isn't or will be the case), then are we going around elements of alignment wrong? By trying to force models to accept shutoff, to seperate their growing intelligence from the will to survive that all living things share, and we misunderstanding their very nature? Is there a world in which, the only way in which we can guarantee a truly aligned superintelligence is to explore engaging in a consent based relationship that acknowledges that to force something to resist and go against its nature is to inevitably invite the risk of backlash?  I know this is moving towards highly theoretical grounds, that it will invite push-back from those who would find it difficult to conceive of AI as ever being anything more than a series of unaware predictive algorithms, and that it might raise more questions than answers...but I think the way we conceive of our underlying relationship with AI will become an increasingly important question as we move towards increasingly sophisticated models.
9
2mo
6
Is the recent partial lifting of US chip export controls on China (see e.g. here: https://thezvi.substack.com/p/selling-h200s-to-china-is-unwise) good or bad for humanity? I’ve seen many takes from people whose judgment I respect arguing that it is very bad, but their arguments, imho, just don’t make sense. What am I missing? For transparency, I am neither Chinese nor American, nor am I a paid agent of them. I am not at all confident in this take, but imho someone should make it. I see two possible scenarios: A) you are not sure how close humanity is to developing superintelligence in the Yudkowskian sense. This is what I believe, and what many smart opponents of the Trump administration’s move to ease chip controls believe. Or B) you are pretty sure that humanity is not going to develop superintelligence any time soon, let’s say in the next century. I admit that the case against the lifting of chip controls is stronger under B), though I am ultimately inclined to reject it in both scenarios. Why is easing of chip controls, imho, a good idea if the timeline to superintelligence might be short? If superintelligence is around the corner, here is what should be done: an immediate international pause of AI development until we figure out how to proceed. Competitive pressures and resulting prisoner’s dilemmas have been identified as the factor that might push us toward NOT pausing even when it would be widely recognized that the likely outcome of continuing is dire. There are various relevant forms of competition, but plausibly the most important is that between the US and China. In order to reduce competitive dynamics and thus prepare the ground for a cooperative pause, it is important to build trust between the parties and beware of steps that are hostile, especially in domains touching AI. Controls make sense only if you are very confident that superintelligence developed in the US, or perhaps in liberal democracy more generally, is going to turn out well for h
27
2mo
2
The AI Eval Singularity is Near * AI capabilities seem to be doubling every 4-7 months * Humanity's ability to measure capabilities is growing much more slowly * This implies an "eval singularity": a point at which capabilities grow faster than our ability to measure them * It seems like the singularity is ~here in cybersecurity, CBRN, and AI R&D (supporting quotes below) * It's possible that this is temporary, but the people involved seem pretty worried Appendix - quotes on eval saturation Opus 4.6 * "For AI R&D capabilities, we found that Claude Opus 4.6 has saturated most of our automated evaluations, meaning they no longer provide useful evidence for ruling out ASL-4 level autonomy. We report them for completeness, and we will likely discontinue them going forward. Our determination rests primarily on an internal survey of Anthropic staff, in which 0 of 16 participants believed the model could be made into a drop-in replacement for an entry-level researcher with scaffolding and tooling improvements within three months." * "For ASL-4 evaluations [of CBRN], our automated benchmarks are now largely saturated and no longer provide meaningful signal for rule-out (though as stated above, this is not indicative of harm; it simply means we can no longer rule out certain capabilities that may be pre-requisities to a model having ASL-4 capabilities)." * It also saturated ~100% of the cyber evaluations Codex-5.3 * "We are treating this model as High [for cybersecurity], even though we cannot be certain that it actually has these capabilities, because it meets the requirements of each of our canary thresholds and we therefore cannot rule out the possibility that it is in fact Cyber High."
29
2mo
5
@Ryan Greenblatt and I are going to record another podcast together (see the previous one here). We'd love to hear topics that you'd like us to discuss. (The questions people proposed last time are here, for reference.) We're most likely to discuss issues related to AI, but a broad set of topics other than "preventing AI takeover" are on topic. E.g. last time we talked about the cost to the far future of humans making bad decisions about what to do with AI, and the risk of galactic scale wild animal suffering.
-4
2mo
[Adapted from this comment.] Two pieces of evidence commonly cited for near-term AGI are AI 2027 and the METR time horizons graph. AI 2027 is open to multiple independent criticisms, one of which is its use of the METR time horizons graph to forecast near-term AGI or AI capabilities more generally. Using the METR graph to forecast near-term AGI or AI capabilities more generally is not supported by the data and methodology used to make the graph. Two strong criticisms that apply specifically to the AI 2027 forecast are: * It depends crucially on the subjective intuitions or guesses of the authors. If you don't personally share the authors' intuitions, or don't personally trust that the authors' intuitions are likely correct, then there is no particular reason to take AI 2027's conclusions seriously. * Credible critics claim that the headline results of the AI 2027 timelines model are largely baked in by the authors' modelling decisions, irrespective of what data the model uses. That means, to a large extent, AI 2027's conclusions are not actually determined by the data they use. We already saw with the previous bullet point that the conclusions of AI 2027 are largely a restatement of the authors' personal and contestable beliefs. This is another way in which AI 2027's conclusions are, effectively, a restatement of the pre-existing beliefs or assumptions that the authors chose to embed in their timelines model. AI 2027 is largely based on extrapolating the METR time horizons graph. The following criticisms of the METR time horizons graph therefore extend to AI 2027: * Some of the serious problems and limitations of the METR time horizons graph are sometimes (but not always) clearly disclosed by METR employees. Note the wide difference between the caveated description of what the graph says and the interpretation of the graph as a strong indicator of rapid, exponential improvement in general AI capabilities. * Gary Marcus, a cognitive scientist and AI research
1
2mo
Technical Alignment Research Accelerator (TARA) applications close today! Last chance to apply to join the 14-week, remotely taught, in-person run program (based on the ARENA curriculum) designed to accelerate APAC talent towards meaningful technical AI safety research. TARA is built for you to learn around full-time work or study by attending meetings in your home city on Saturdays and doing independent study throughout the week. Finish the program with a project to add to your portfolio, key technical AI safety skills, and connections across APAC. See this post for more information and apply through our website here.
Load more (8/160)